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PREFACE 


These are days of planning, and in view of the 
fact that statistics play an important role in 
national organization and planning, a book 
like this has at this moment a special interest. 

The author has spent the past ten years of his 
life in the Stock Exchange doing statistical 
work of an intensive nature. During this 
period he has always felt the need of presenting 
to the businessmen in general an exposition 
of statistical method more rationalized than 
usually found in textbooks on the subject. 
The text, therefore, develops in the simplest 
possible manner, the statistical techniques, 
first for the treatment of numerical data and 
then for the drawing of inferences therefrom. 
In the development of this treatise the author 
has had always the needs and requirements ot 
the businessman in view. The author realizes 
that the businessman may not have the required 
preparation in algebra and calculus necessary 
for various statistical calculations. On this 
assumption the author has all through the 
work given calculations involving the use ol 
arithmetic only. 


• • • 

Vlll 

At every stage the methods are illustrated in 
their application to a variety of economic and 
business problems. Illustrative materials and 
problems have been chiefly taken from the 
held of Indian economics and business. In 
certain cases, however, assumed or hypothetical 
data have been used, and where this is done it is 
indicated by a footnote. 

To initiate the businessman in calculation by 
logarithms an explanatory chapter giving the 
practical aspects of it has been given at the end 
of the book. 

The author is thankful to his numerous friends 
and associates in Clive Street for stimulating 
him with encouragement all through the work. 


A. K. SUR 
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CHAPTER I 

STATISTICS IN BUSINESS 

When used in the plural, ‘ Statistics ' means 
numerical facts or data in any department of 
enquiry placed in relation to each other. In 
its singular sense, however, it refers to the 
subject itself, which is defined in the Oxford 
Dictionary as “ the department of study that 
has for its object the collection and arrange¬ 
ment of numerical facts or data, whether 
relating to human affairs or to natural pheno¬ 
mena.” This is however, a narrow definition 
of the Science of Statistics. For, Statistics 
concerns itself not merely with the collection 
and arrangement of numerical facts or data, but 
with their analysis and interpretation as well. 

Taken in isolation, statistical data fail to tell 
their own stories. But when properly collated 
and co-ordinated for comparison with appropriate 
or cognate groups of items they become voluble 
and shed much light on problems that otherwise 
would remain obscure and unintelligible. In 
this way statistical data have made intelligible 
to us problems in many a practical affair ol life, 
that otherwise would have remained devoid of 
any significance. Indeed, the use ol statis¬ 
tics in the business of running the country 
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through its political, commercial and social 
institutions —in those activities that determine 
the health, wealth and happiness of mankind— 
is the oldest and the most considerable use.” 

In the business sphere particularly statistics 
have their uses in diverse ways. Business 
itself, it may be noted, is founded upon estimates 
and probabilities. The businessman sizes up 
his production on the basis of probable 
demand for same. Such estimates are always 
based upon past records and experience, as 
also upon the changing tastes of the times, and 
if there be any error or blunder in the estimate, 
the businessman is apt to come to grief. Success 
in business rests upon estimates approximating 
as nearly as possible to actual results. 

Discrepancy between estimates and results 
cannot be eliminated unless the estimates are 
based upon some scientific methods. And it is 
the concern of the Science of Statistics to point 
out the correct methods. 

It is for this reason that we find progressive 
businessmen the world over realizing more and 
more the importance of setting up statistical 

branches as necessary adjuncts to their business 
organizations. 
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It is the function of such statistical research 
departments to collect, collate and co-ordinate 
facts and figures, not only relating to the 
particular business of the house, but in regard 
to competitive businesses too and of trade and 
finance generally. Data thus collected help 
the business houses concerned in exploring new 
markets and fresh avenues of income, as also 
in eliminating competition. For the internal 
organization of the house itself such data are of 
great value for the solution of many a problem 
of production, selling, management and budge¬ 
tary control. 


Furthermore, such statistical investigations 
provide the firm concerned with a life and death 
test of its own progress, for the nature and 
effects of any changes observed, when properly 
analyzed and checked, are helpful in keeping 
the machine moving much more smoothly or 


saving it from peril. 



CHAPTER II 

GATHERING OF DATA 

The first step in all statistical investigations 
is, of course, the gathering of appropriate data 
for purposes of estimation. Utmost care should 
be devoted to this aspect of the work, in as 
much as if the data are imperfect or not judici¬ 
ously selected, the results or conclusions are 
likely to be erroneous and misleading. As a 
point of fact, data to be of real value for statis¬ 
tical purposes, must be precise, accurate and 
stable. To ensure this, terms to which the 
figures relate should have a precise connotation, 
and that connotation or definition sh®uld not 
be changed or modified at any stage during the 
whole process of gathering the data. For 
instance, although such terms as ‘ manufactur¬ 
ing concerns,’ ‘ trade,’ ‘ profit,’ ‘working 
expenses,’ etc., have different significance in 
common usage, yet when used for statistical 
treatment, they should have a definite connota¬ 
tion, and that connotation should be strictly 
adhered to all through the same enquiry. In 
other words, variables (i.e., entities like ‘ profits,’ 
‘ sales, ' outputs,’ etc., which assume different 
values in different periods and circumstances) 
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should remain static in significance and connota¬ 
tion all through the enquiry. Lastly, we must 
be definitely precise as to what particular 
region or place the statistics refer, and to what 
instant or period of time they relate. 

Statistical data can be gathered either from 
published sources of information or by ad hoc 
enquiries. Published sources abound in many 
fields of economic investigations. Firstly, there 
are the official blue books periodically issued by 
the various government departments, generally 
as a by-product of certain administrative opera¬ 
tions. Secondly, the reports of the various 
Royal Commissions and similar other statutorily 
appointed bodies embody the results of many 
statistical enquiries specifically undertaken by 
them. Thirdly, there are the reports of the 
Reserve Bank of India and oi various non- 
official organizations like the Chambers ot 
Commerce, the Stock Exchanges, the Central 
Jute Committee, the Indian Jute Mills Associa¬ 
tion, the Millowners’ Associations, and similar 
other bodies. Trade gazettes and some ot the 
financial papers also feature many statistical 
Index Numbers specially prepared by them. 

As regards ad hoc enquiries or special types ol 
investigations made by individuals, the 
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Questionnaire method appears to be best. 
‘ Questionnaire forms’ for such purposes must 
be compiled with utmost care and thought. 
These should be as simple as possible, and the 
questions should be so 
elicit either a precise answer like “ Yes ” or 
“ No,” or a definite number. Where possible 
ambiguity may arise all questions should be 
clearly defined. Further, the questions asked 
must be of an inoffensive character, and must 
not be so framed as to deter the person to whom 
they are addressed from answering all the facts 
about them candidly. Lastly, the form should 
contain enough blank spaces to accommodate the 
answers even when penned in large handwriting. 
All such ‘ questionnaire forms ’ should be 
headed with appropriate instructions for filling 
in of the data, to make mistakes difficult. 

Classes of Data 

For all practical purposes, statistical data 
may be classified under two heads : (i) a Time 
Series , that is to say, a series in which items or 
observations are distributed in relation to some 
imit of time. This includes such items as 
monthly t-otals of jute mills production, 
company profits or crop production year by 
year, weekly earnings of railways and so forth. 


carefully framed as to 
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Time series again are of two kinds. In the 
first kind the figures relate to some quantity 
which is measured at a particular instant of 
time, e. g., the census of population figures in 
a country on a given date in a particular year. 

CP 

And in the second series the figures relate to 
some particular quantity in a number ol time 
intervals, e. g., the figures of total production 
of jute manufactures month by month. (») A 
Frequency Distribution, that is to say, a series in 
which the items of observations are distributed 
with respect to some physical characteristics, 
e. g. t distribution of jute mills according to 
the number of looms, the distribution ol 
companies according to the size of capital 
or profits, the distribution of wage earners 
according to weekly wages received, and so 
forth. Data of a frequency distribution may 
be either of (a) the Continuous type i. e. one 
that may have any number ot values ranging 
between the lowest and the highest, such as 
farm costs of production ; or ot the ( b) the 
Discrete type i. e. one with a value distinct 
and separate, as, for example, the number oi 
workers in a factory. 

Census & Sample Enquiries 

When the whole information about the subject 

of enquiry is collected we call it a Census 



8 STATISTICS—HOW TO HANDLE THEM 


enquiry, but when only a part of it is sought 

we call it a Sample enquiry. In the study of 

business or economic problems, sample enquiry 

is generally found to be more expedient than 

census enquiry, because it saves time, money 

and labour. The process involved in selection of 

data for a sample enquiry is known as Sampling. 

Such a sample is generally drawn purely at 

random from the field of enquiry, but the one 

thing that is to be always clearly borne 

in mind when a sample is drawn, is that the 

examples picked are really representative of 

the entire field of investigation. A method 

that is usually employed for the random 

sampling process is either the lottery system or 

the system of selecting every tenth example 

(for instance, from a directory), or pricking 

the required number of cases in a list while 

blindfolded. One advantage of this method 

over that of personal selection is that it does 

not lead to unconscious bias. The maxim to 

be followed in this connection is that larger 

the number of representative cases or examples, 

the more accurate and satisfactory will be the 

% 

standard. 

A question that may naturally spring up in 
this connection is : what warrant is there that 
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estimates made from a w sample are as good as 
that from the ‘ total population' (the entire 
field of enquiry is in Statistics known as the 
‘ Population ’ or the ‘ Universe )? Mathemati¬ 
cians have proved that “ if a moderately large 
number of items be chosen at random from 
among a large mass, such numbers arc on the 
average almost sure to have the same chaiac- 
teristics of the large group, and the data so 
obtained can be safely used as a base for compa¬ 
rison with all other examples of the same kind. 
This theory is known as the Theory of Proba¬ 
bility or the Law of Statistical Regularity. 
(In such selection of cases, it should however 
be seen that the examples picked are really 
representative of the entire field of investiga¬ 
tion). 


Now, in course of our enquiry we may meet 
with some items or observations that would 
manifest unusual or abnormal values. But 
by another law known as the Law ol the 
Permanence, of Small Numbers we know that 
“ when a particular or unusual characteristic 
occurs in a properly selected sample it may be 
expected that this same characteristic is likely 
to be present in the entire group from which 
the sample was taken in the same proportion 



10 STATISTICS—HOW TO HANDLE THEM 

as it is present in the sample itself. In other 
words, the characteristic is permanent, being 
present in every group similarly selected and 
composed of the same number of individuals 
as the original sample.” 

Again, according to the Law of Inertia of 
Large. Numbers , variations in one direction are 
apt to be offset or equalized by variations in 
another direction. The Moving Averages 
( see Ch. V ) are particularly useful where 
such abnormalities are conspicuously evident. 
Lastly, it should be noted that there is a distinct 
relationship between the size and degree of 
precision of a sample. As a point of fact, the 
degree of precision of a sa'mple increases as the 
number of items in the sample is increased, and 
that it varies in accordance with the square root 
of the number of items constituting the sample. 


CHAPTER HI 

ARRANGING THE DATA 

Statistical data, in their crude form, having 
been gathered, the statistician's next job is to 
set them out in tabular form in such a manner 
as to make them suitable for appropriate 
comparison and bring out their significant 
features before the reader’s eye. When tabulat - 
ing the data the problem for which the statistical 
enquiry is being conducted should be uppermost 
in the statistician’s mind, and with that specific 
problem in mind he should devise how best 
the data can be presented to make them clear, 
convenient, intelligible and readable. In brief, 
the effectiveness of presentation and the imme¬ 
diacy of need should be the cardinal principles 
of all tabulation work. (Be it noted, however, 
that after the data for the immediate need 
have been tabulated, the original material 
should not be thrown away as the same may 
serve a different useful purpose in future). 

Rules for Tabulation 

1. The statistical table should not be made 
too large and cumbersome, or of a size that is 
difficult for the eye to catch its significance at 
a glance. When however a large table becomes 
inevitable, it should be split up into sections, 
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and the different sections summed up in a 
separate subsidiary table. In this connection, 
Dr. Rhodes in his Elements of Statistics observes 
that a golden rule is to make out two tables 
instead of one if there is the least fear that one 
would contain such a mass of material as 
would tend to make the essential facts therein 
obscure and hidden. 

2. The number of sections into which a 
column should be split up would, of course, 
depend upon the discretion of the statistician, 
but attention should always be paid to the 
relative importance of the various data, and 
the principle to be followed in this respect is to 
make the less important follow the more 
important. The usual method that is followed 
is to arrange the sub-divisions on the basis of 
some well-defined principles ( e . g., period, size, 
merit, function, alphabetical, geographical or 
spatial, species, etc.) in vertical form in the 
first column, and put the respective units of 
measurements against them horizontally. 

3. All comparative figures should be placed 
in a vertical row, tor a horizontal line of figures 
is more tiresome to the eye. Where the totals 
are the things that principally concern us, 
they should be put at the head of the vertical 
formation, or on the left side of a horizontal 
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tabulation, if of course, the comparables have 
been put horizontally. 

4. Percentages loom large in business stalls- 
tics and they are oftentimes incorporated into 
tabular statements. Where percentages are 
used due care must be given to their calculation. 


They should be correctly worked out to a 
definite place of decimals, and where the same 
base is used they can be checked by adding t le 
percentages, since they would collet tly amount 
to 100 per cent (though in actual practice a 
minor fractional variation is noticeable, but 
that is negligible). When however percentages 
are worked on different bases, they should not 
be added as above. In this case, the percentage 
of the total should be calculated on the actual 
totals of the columns concerned, as is shown m 


the table below : 

Table I -Gross Earnings & Working 
Expenses of the McLeod Group of 
Railways for the Year Ended 
31-3-45. 


Railways 


Working Gross 
Expenses Huntings 


Us. 


IK 


lVr cent 
of 

Expenses 
to Gross 
turnings 


Ahmodpur-Kutwa 
Bunkum Dumodur 
Burd wun Kutwu 
Knligiiat Fulta 
Kutukhul Lolubazur 


Total. 


1,33,817 

1,09,162 

1,35,330 

2,20,773 

34,282 

' 0,93,304 


1,23.187 

1.49,785 

1,03,970 

2,14,277 

68,604 

7,19,780 


108-113 
112-93 
82-53 
103 03 
50.00 

00-33 




14 STATISTICS—HOW TO HANDLE THEM 

When both actual figures and percentages or 

• 

averages arc placed in the same table, they 
should appear in close juxtaposition to each 
other. 

5. If there be any gap against any item in 
any of the columns (that is to say, where such 
.figures are not available), the totals thereof 
are not to be made up, or are to be shown in a 
different type from the rest. 

' 6. Where numbers are composed of many 
digits, the terminal digits may be dropped 
by use of approximation*, but it should be 
noted that the figures do not lose their 
importance and value thereby. For instance, 
where the figures may be used for further 
tabulation or enquiry, they should be shown 
always in their entirety in actual numbers. 

7. Large numbers should be separated by 
commas, and numbers in which fractional parts 
are involved, should be clearly separated by 
decimal points. Again, all such commas and 
decimal points in a column must be kept in 
proper alignment. 


'Such mothods of approximation enable us to obtain correct 
results in regard to the calculation of percentages to the 
noarost 1 por cent. 
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8. Finally, all statistical tables must be 
complete and self-explanatory. They should, 
as a point of fact, be accompanied by such 
explanatory notes as would leave no possible 
ambiguity in the interpretation of the meanings 
of the tables. They must have proper labels 
or titles to them at the top, and footnotes 
should be used to show sources of data and to 
point out factors of importance that cannot be 
otherwise incorporated in the body of the table. 

Frequency Tables 

When the data belong to a Frequency 
Distribution, there will arise the necessity 
of arranging the data according to their respec¬ 
tive magnitudes, into what is known as a 
Frequency Table. Thus, if there be a group of 
419 industrial undertakings, and of these 20 
earn a profit of not over Rs. 5,000 per annum; 
30 earn a profit of over Rs. 5,000 but not 
exceeding Rs. 10,000; 35 earn from Rs. 10,001 
to Rs. 15,000; 25 from Rs. 15,001 to Rs. 20,000; 
40 from Rs. 20,001 to Rs. 25,000; 50 from 
Rs. 25,001 to Rs. 30,000; 60 from Rs. 30,001 to 
Rs. 35,000; 40 from Rs. 35,001, to Rs. 40,000; 
50 from Rs. 40,001 to Rs. 45,000; and 69 from 
Rs. 45,001 to Rs. 50,000, the subjoined is the 
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Frequency Table that would be made out 
from these crude data : 


Groups 

Classes 

Class 

Marks 

Frequencies 

Cumulative 

Frequencies 

1. 

—5,000 

• • • 

20 

20 

2. 

5,001—10,000 

7,500 

30 

50 

3. 

10,001—15,000 

12,500 

35 

85 

4. 

15,001—20.000 

17,500 

25 

110 

5. 

20,001—25,000 

22.500 

40 

150 

li. 

25,001—30,000 

27,500 

50 

200 

7. 

30,001—35,000 

32,500 

GO 

260 

S. 

35,001—10,000 

37,500 

40 

300 

0. 

40.001—45.000 

42,500 

50 

350 

10. 

45,001—50,000 

47,500 

60 

419 


Here the significance of certain terms used in 
connection with a Frequency Distribution 
should be explained. The difference between 
the highest profits in the class below the one 
in question and the lowest class above is known 
as Class Interval (also known as Interpolation). 
In other words, the class interval is the interval 
which sets bounds to each class of the frequency 
distribution. By Class Mark or Number is 
meant a value (generally the arithmetic mean 
of the class limits) which serves to designate the 
class. 

The whole of the class range (that is to say, 
from Rs. 5,000 profit to Rs. 50,000 profit) is 
known as an Array, and the middle or the 
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central item in the array which most closely 
correspond to the magnitudes of all other 
examples in the array is known as the Median. 
The median is located half-way down the array. 
An array differs from a series in that whereas 
an array is an arrangement on the basis ot 
magnitudes, the series is not necessarily such. 
It is rather an arrangement in logical sequences. 


When an array is divided into four equal 
parts each such division is known as a Qua) tile. 
The quartile between the lower extreme and 
the median is called the Lower Quartile , and that 
between the median and the upper extreme 
is known as the Upper Quartile. Briefly stated 
the second and the third quartiles are respec¬ 
tively called the Lower and Upper Quartiles. 


As will be perceived from the preceding 
table, a frequency table contains distinct 
columns for the recording of the serial numbei 
of the groups, classes, class marks and 
frequencies of each group. In the last column 
headed Cumulative Frequencies, the frequence s 
of the preceding classes have been cumulated, 
that is to say, superaddcd at each stage, and 
when this column appears in a frequency 
table, we call it a 4 Cumulative fc requeue) 


Distribution Table.’ 
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In the study of statistical problems a dis¬ 
tribution may be found to be either of the 
Symmetrical or Asymmetrical type. It is said 
to be of the symmetrical type when the same 
number of observations or frequencies are 
found to be distributed at the same linear 
distance on either side of the midpoint of the 
frequencies, as illustrated in the following 

table :— 

Symmetrical Distribution of Profits Made 

by 114 Companies* _ 

Profit Class Frequency 


—5,000 

5 , 001 — 10,000 

10.001—15,000 

15,001—20,000 

20,001—25,000 

25,001—30,000 

30,001—35,000 


2 

15 

25 

30 

25 

15 

2 


Total... 


114 


Further a symmetrical distribution is called 
to be of the Normal type when the observations 
or frequencies are found to vary at a rate that 
would be theoretically expected in an infinite 
number of trials. Such a theoretical conception 
of its occurrence is derived from the mathe¬ 
matical law relating to the probable occurrence 
of chance phenomena in accordance with 


•Hypothetical data. 
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binomial expansion. It simply means that 
“ the values of a variate in a large number of 
cases tend to be distributed uniformly according 
to the mathematical law about the value 
that occurs the greatest number of times.'’ 
The following is an example of Normal 
Frequency Distribution (constituting the 
binomial expansion of the tenth power) 


Normal Frequency Distribution of Average Wages 
Earned by Workers in a Certain Factory 




ur> 

70 

75 

80 

85 

90 

95 

100 


120 

210 

252 

210 

120 

45 

10 

1 




~Under actual conditions, however, normal 
frequencies arc seldom met with when dealing 
with economic problems. Rather, as a point ol 
fact, in the majority of instances, a representa¬ 
tive sample of economic data will show positive 

'Hypothetical data. 
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distortion or skewness ( see Ch. VI), and 
when normal frequency is noticed in such a 
distribution, unrepresentativeness of the sample 
is to be justifiably suspected and should not, 
therefore, be relied upon as a basis for judgment 
relating to the characteristics of the group. 

A distribution would be called Asymmetrical 
when there are unequal number of items or 
frequencies lying to the left and right of the 
mode ( see Ch. V ). 


CHAPTER IV 

VISUALIZING THE DATA 

One of the most familiar ways m which 
statistics can be presented to make them suitable 
for proper appreciation is by means o Graph, 

To the businessman they are particularly help! 
in presenting an effective visual picture oi 
the trend of sales and purchases, price fluctua¬ 
tions gross profits and expenses, turnovu 
and net profits, as also various kinds of records 
(for instance, those relating to the vanous 
departments, factories, finance and costing, etc ) 
Their chief importance to the sta is i • 
however, rests on the fact that they ^te 
the prcliminarv examination of most ■ ■ . 

bringing readily to the eye the salient features 
of the tabulated results of a compilation. 

Graphs are constructed liv plotting the static 
tieal data on sectional papers (available 
various scales) containing mutually perpendmu- 
lar intersecting straight lines called Axe,. 
horizontal axis is usually called tie '; a ^ 
the vertical axis the y-axis, and their point ot 
intersection as the Orujin. In plotting a giap 
it is necessary to know only two points, 

* join them by a straight line. The ^- 

expressed by the symbol («, W1 
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indicates the distance along the x-axis and b 
the distance along the y-axis. 

The statistician should never omit to label 
every graph with a short and appropriate 
description and to mark the scales along the 
axis, for it must be remembered that a graph, 
however neat, is quite worthless for any 
practical purpose, unless it is provided with a 
label and scales. 

Representation of Time Series 

When time element exists in the data, we 
generally use the horizontal scale for the time 
element and the vertical scale for the magnitude 
of the items. A series of points at equal inter¬ 
vals are marked on the horizontal base line 
corresponding to a series of years, or months 
or whatever time interval is involved. The 
variable quantities are represented by vertical 
ordinates erected at points to the horizontal 
scale corresponding to the particular instants 
of time to which the statistics relate. To 
represent them in the form of a graph, all that 
is necessary is to mark only the end of the 
ordinate by means of a point on the paper, and 
then connect the consecutive points by means 
of straight lines. The result will be a curve. 
It is clear that any rise or fall in the magnitude 
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of the items we seek to illustrate will be reflected 
by a corresponding rise or fall of the curve in 
the graph. As the continuity of the curve 
can he extended with the entry of new time 
elements into the matter, such a curve is of 
great importance and usefulness in visualizing 
at a glance the progress of any business activity. 
Then again, different curves (either for different 
series of the same units or for two or more 
series of different units) can he plotted on the 
same chart, and so the businessman can at 
one glance compare the progress of one thing 
with the other, or different aspects of the same 
thing. Further, if the periodic moving average 
is as well plotted on the same chart it should 
enable the businessman to perceive whether the 
short term fluctuations have any permanent 
effect upon the solidarity or otherwise of the 

business. 

When the vertical scale proves large enough 
to be consistent with the size of the paper, we 
eliminate a part of the diagram and note it >y 
observing that the zero on the vertical sea e 
does not coincide with the horizontal base 
line on which the time element is shown. But 
as diagrams are mainly constructed to help 
others in appreciating some statistical facts, 
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so it would be futile to represent several time 
series on the same chart, if the graphs nearly 
superimpose on one another or they cross 
and recross. When two or more series of 
figures belonging to different units are plotted 
on the same chart, the need arises of more 
than one vertical scale, but this difficulty 
can be easily obviated by sacrificing the units 
involved and resorting to percentages instead. 
When percentages are thus plotted only the 
relative figures are shown in the graph, instead 
of actual sizes of the figures, and, therefore, 
the diagram cannot pretend to show properly 
the original table. It should be noted that the 
basic figure in such series is 100 per cent, 
and this is emphasized by drawing a horizontal 
line through 100 per cent on the vertical scale. 

When rises and falls are in their actual 
magnitudes plotted on the chart it is called 
the Natural Scale Method. The one great 
demerit of such a method is that it does not 
enable tin* businessman to know what ratio 
the fluctuations of one period bear to the other, 
for rises and falls of equal magnitudes are 
shown by the same vertical distance. For 
instance, if the fluctuations in one period be 
Irom Rs. 4 to Rs. 8 and in another period from 
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Rs. 50 to Rs. 54, the curve would move up the 
verticle scale exactly the same distance in 
space, yet the percentage of increase in the 
first period was 100 per cent and in the second 
period only 8 per cent. When the natural 
scale is used the base line should represent 
the zero, otherwise the true prespective of the 
rise and fall would be lost. 

When however we wish to avoid the calcula¬ 
tion of percentages, and like to show the ratios 
of the rise and fall instead of the rises and 
falls themselves absolutely, we plot them m 
what is known as the Logarithmic Scale. “To 
construct a Logarithmic Scale, we first find 
the logarithms of the numbers we desire 
to plot and divide our scale into such 
a number of equal divisions as will allow 
all the logarithmic numbers progressing in a 
uniform manner. The logarithms of the actual 
numbers are then plotted, instead of the 
numbers themselves." It should, howevci, hi 
clearly borne in mind that such a chart merely 
shows the ratios of rises and falls, and not 
anything else. The chief thing to remember 
in this connection is that whereas in a natural 
scale graph “ the same length on the paper in 
any part of the scale is equivalent to the same 

4cc. ^ 
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number of units,” “in a ratio scale this is 
not the case the length of the interval between 
two values on the scale is proportional to that 
between these two values.” As a matter of 
fact, “ consecutive points on the scale corres¬ 
ponding to consecutive integars get closer 
and closer together as we increase the numbers 
from 1.” 

A comparison of the Logarithmic Chart with 
the Natural Scale Chart often reveals much 
useful information. When the Logarithmic 
Scale is used, there should be no base line, 
otherwise fallacious conclusions will follow. 
Again, “ there is no zero on logarithmic paper, 
as ‘log O' is an indefinitely large negative 
number ” (Printed Logarithmic Papers can be 
obtained from the market). The Logarithmic 
Scale is appropriately used for representing 
a “ series involving different units or a series 
of figures which change fundamentally as time 
goes on, increasing or decreasing at a great 
rate.” 

Representation of Frequency Distribution 

In plotting a Frequency Distribution we 
take the x-axis for the representation of the 
class marks and //-axis for the representation 
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of the frequencies. But since each frequency is 
an integar and includes all the individuals 
within the class interval to which it applies 
the frequency is customarily represented not 
by a single ordinate, but by a reetang e w o* 
base is the class interval and whose length » 
equal numerically to the value of the frequency. 
The diagram formed by the frequency rec anties 
is called a Histogram, as distinguished from 
Historigram which is the representation of * 
historical or time series. From a glance 
such a representation it would at once be clear 
that if the number of class intervals be increased, 
that is to say, if the divergence m the magnitude 
of the class interval be smaller, narrower 
and shorter the rectangular blocks won d be 
so that if the narrowing process be contoured 
sufficiently far, a smooth curve would 
in place of rectangular blocks. The n u 
corollary of this is that the rectangular block, 
do not afford us a correct view of the situat.om 

XZ - 

to that of the rectangles. 
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Normal Frequency Curve 

When a polygon representing a normal dis¬ 
tribution is smoothened (in such a process 
all irregularities are to be smoothened out) 
into a curve, the curve so formed is known 
as the Normal Frequency Curve (also known 
as the Gaussian Curve after Karl Friedrich 
Gauss). This curve should begin and end on 
the same base line. In the representation of 
all chance or natural phenomena it would be 
easy to construct such a curve by simple 
elimination of all accidental variations. But 
in the representation of business phenomena, 
this method is strewn with difficulty on account 
of the uneven distribution (that is to say, 
when the number of items falling into class 
above and below but located at an equal distance 
irom the modal group are not approximately 
the same) of the data. In such a case, it is 
obvious that the sides of the bell-shaped 
curve will not be symmetrical, and there 
would be present a skewness. When, however, 
the data are evenly distributed and the curve 
is of a symmetrical form, the median will 
be found to be located within the modal group, 
for it would always bisect the area of the 
diagram. But when skewness is present, the 
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median wiU be shifted from its proper position 
and wiU fall outside the modal group. 

instead of P'»«~ “* 

8 ” P f »”. C> tn « t‘ a'lJition of successive 

^““”2 ~ rt ” 

frequencies. In this aIld 

ogive line is the -median,' 

r " 

of ^ magnitude of 

riLT-r - -- m -t 

can be located by marking the tmddk numb 

on the vertical scale, then drawing a 

line from there to ^^"Sta. 

base (axis) the interseetmg pmnt g (ong J 

at the base, will give the magn & be 

median item. The same method ca. ■ ^ 
applied to obtain the magmtud J - 
particular item under rev.ew-but >t 
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be noted that the mode is not very easily 
located on the ogive. 

Other Charts 

Sometimes, for the representation of statis¬ 
tical data we also use the Bar Charts (vertical, 
horizontal, etc.), Maps (cross-hatch, multiple 
dots, etc.), Block Diagrams , Square Diagrams 
and Circular Diagrams , Silhouette Charts , etc.; 
but useful though these devices sometimes 
are, they can never approach in importance 
in the mathematical analysis of the data to 
the ordinary method of graphing described 
above. 


CHAPTER V 


determining the central tendency 

When dealing with a large mass of data 
it becomes difficult for us to comprehend 
its characteristics properly. On account o 
the difficulty which is thus encountered m 
grasping a large mass of figures, it is convene.! 
to use a statistical type or average win h 
sums up briefly the central tendency of the 
mass. It is, in other words, a useful way 
of representing the various values of a variable 

by a single value. 

Averages generally used in statistical analysis 
are: (1) Arithmetic Mean, (2) Quadratic Mean 
or Root-Mean Square, (3) Oeometnc Mean 
(4) Harmonic Mean, (5) Median, ami (<>) Mode. 

Arithmetic Mean 

The Arithmetic Mean is the most common 
type of average that is used in statistical 
analysis. It is obtained by dividing the sum- 
total of the values of a number of items by u 
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number of items itself. Thus, if 993 joint 
stock companies were formed in 1936, 1,175 
in 1937, 986 in 1938, 996 in 1939 and 1,005 in 
1940, then the annual average thereof will be 
calculated by adding together the figures of 
the five years (5,155) and dividing the same by 
5. This would show 1,031 to be the annual 
average (or Arithmetic Mean). 

Sometimes we adopt a different method for 
the calculation of the average, and particularly 
when the items are too many in number 
and are of near value. In applying this method, 
we assume an arbitrary figure and take that 
to be the average of the group. We then add 
together (algebrical summation) the deviations 
of each of the items from the assumed average, 
and divide the total by the number of items. 
The quotient thus obtained (called the Correc¬ 
tion Factor) when added to the assumed average 
will give the true average. Thus in the above 
example let us assume 1,000 to be the average. 
Its deviations (as shown in columns 3 and 4 of 
the subjoined table) from the actual figures, 
summated algebrically give 155 which divided 
by 5 yields +31, and this added to the assumed 
average (1,000) gives us 1,031 which as we 
have already seen, is the true mean of the 
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series. This is illustrated in the following 
table : 

Company Flotations in India (1936-40) 


Yeor 

No. of 
Companies 

Deviations from tho Assumed Mean 

Minus 

Plus 

_ , • • " 

1936 

993 

** 

i 

1 M • 

1937 

1,175 


1 / .J 

1938 

98G 

14 


1939 

996 

4 

m 

1940 

1,005 


0 

Total 

5,155 

25 

180 

Assumed Mean 

1,000 



True Mean 

1,031 




It necessarily follows from this that had 
the assumed average been the true average, 
then the sum of the deviations would have 
been zero. In other words, the arithmetic 
mean is the point from which the algebraic 
sum of the deviations is zero. Interpreted in 
simple language this means that if the plus 
deviations are summated in one column and 
the minus deviations in another, the summation 
will be of the same size, with a difference ot 
zero, showing thereby that the mean is the 
point around which the deviations reach a 

minimum. 

It should be noted that an Arithmetic- 
Moan though advantageous in many respects 
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is nevertheless fallacious and inaccurate when 
it is unsupported by the actual figures used 
in the computation of such an average. Again, 
though it is useful in showing the average 
size of items in a series or array, it gives no 
hint as to the extent of variation in magnitudes 
between the highest and lowest figures. This 
will be clear from the following table which 
shows that though the range of variations 
in the magnitudes of items of the two series 
(in one case from 11 to 21‘2 and in another 
from 0*9 to 13*8) is considerable, yet their 
mean is the same. 

Average Yield Per Cent Per Annum from 
10 Groups of Equities in 1938 & 1939. 


Securities 

1938 

1939 

Jutes 

11 

0-9 

Coals 

6-3 

6-2 

Railways 

47 

3-3 

Cottons 

3-3 

5-4 

Electrics 

7-7 

7-7 

Minings 

101 

12-2 

Engineerings 

21-2 

13-8 

Teas 

33 

40 

Banks & Insurance 

6-7 

6-8 

Miscellaneous 

2-1 

5-2 


Mean 


6-55 6-55 


Sometimes when it is desired to give full 


weight or importance to variations, the weighted 
average is used. The weighted average is 


obtained by multiplying each item of a series 


of quantities by the number of subjects con- 
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nected with it, these multiples being called 
“ weights the sums of the products when 
divided by the sum of the weights, will give 
us the weighted average. Thus, if 100 things 
are bought at Rs. 4 each and 50 things at 
Rs. 5 each, then the weighted average thereof 
will be calculated as follows : 

(100x4) +_ (50x5) _£ s 4.4 p er thing 
(100 + 50) 

In other words, whereas in the calculation ol 
the simple mean we consider each of the items 
of the series to be of equal importance, in the 
weighted average we do not. An example 
will clarify the point. Suppose, for instance 
we are to find out the average yield per acre of 
jute for 1938. We tabulate the data as follows 

in the table below. 

Acreage & Production of Jute in 1938 
(000’s Omitted) 


Area 

Acrcago 

Production 

Yield 
per aero 

WeHt Bengal 

North Bengal 

Eaat Bengal 

Cooch Bohur, Tippera 

Btatert Nepal 

Aiwam 

Bihar 

Oriiwa 

287-2 
603-1 
1,276-3 

42 3 
2191 
4450 
15-3 

4,280 

9,040 

20,955 

610 

3,275 

4,480 

165 

1400 

15- 98 

16- 41 

14-42 

14-94 

1006 

10-78 

Total 

2,888-3 

43.405 

97-40 

Simple Moan 

13-927 



Weighted Mean 

15-027 
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Here the simple mean is obtained by dividing 
the sum total of figures in column 4 by 7 (the 
number of areas under review), but in doing 
this we take no account of the fact that the 
individual average yields are associated with 
varying acreages and production in the several 
areas. These differences are properly set off 
by the weighted mean. In this case the 
weighted mean is calculated by dividing the 
total of column 3 (production) by the total of 
column 2 (acreage). It should, however, be 
noted in this connection that when different 
values are associated with like quantities, 
the weighted and the unweighted averages are 
the same. But if different values are asso¬ 
ciated with unlike quantities the weighted 
and the unweighted averages will not be the 
same. In the latter case, with every change 
or alteration in the quantity, there would be 
a corresponding change in the magnitude of 
the weighted average. It can further be stated 
that “ when all the factors of a group or universe 
are present in a sample in the same proportions 
as they exist in the group from which the sample 
was taken, the so-called weighted average 
and the simple average are of the same size.” 
Hence, there is the necessity of a sample being 
representative in character. 
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The weighted average is useful for obtaining 
the average cost of articles bought at different 
periods or in varying amounts, or at different 
prices. It is particularly useful where quantities 

are in evidence. 

In dealing with business or economic data, 
we often use the Moving Average. This is 
obtained by omitting from the component 
series the earliest item and taking in its place 
the most recent one. This is very useful in 
showing the nature of fluctuations over a 
given period. This is illustrated in the follow¬ 
ing table : 

C 

Retail Price of Rice at Rungpur 1924-33 


Year 

l’rico per 
Md. 

Rs. 

Moving 

Average 

Progressive 

Average 

1024 

S-4 



1925 

8-3 

s-35 

S-35 

1920 

S-3 

H-30 

8-33 

1027 

«•:> 

S-40 

8-37 

102S 

91 

8*80 

S-.72 

1920 

7-2 

815 

8-30 

1930 

72 

720 

SI 4 

1931 

1-4 

5-80 

7-07 

1932 

3-2 

3-80 

7-18 

1933 

31 

3-15 

0-77 


Mean 
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In measuring the growth of progress of a new 
business, however, we sometime use the Pro¬ 
gressive Average. Unlike the moving average, 
i& this case we do not omit the earliest figures, 
but calculate the average by simple inclusion 
of the newest or the most recent figure. So 
the difference between a moving average and 
a progressive average is that whereas in a 
moving average the earliest data are omitted, 
in a progressive average, on the other hand, 
all the data are taken into consideration. 
Its chief usefulness lies in measuring the fluc¬ 
tuations of an item over a period for which 
no representative moving average can be 
obtained. But with the lapse of time when it 
becomes possible to obtain a representative 
moving average, a progressive average should be 
discarded in favour of it. For, in such circum¬ 
stances a progressive average is likely to be 
distorted or biassed by the earlier data. 

The arithmetic mean of a Frequency Distribu¬ 
tion (particularly of a continuous type) is 
calculated by multiplying the midpoints of the 
classes by the number of frequencies in each 
class, and then dividing the summation of 
the individual products by the total number 
of frequencies in the distribution. The reason 
for using the midpoints of the classes is that, 
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assuming them to be the mean of the respective 
classes, the algebraic sum of the deviations 

is zero. 

When, however, the distribution is not of 
normal character, it is better to calculate the 
mean by multiplying each individual item by 
its respective value, summating the products 
thus obtained, and then dividing by the total 
number of items. This will give the weighted 
mean of a frequency distribution. When, how¬ 
ever, the distribution is of the discrete type, 
as interest rates and farm mortgages, the 
mean is calculated by multiplying each interest 
rate by the value of the mortgage, and then 
by the number of frequencies, summating 
and dividing by the total values of all the 

mortgages. 

The application of the method is exemplified 
in the following table _ 

Clous Fro- Col. 1 / 2 

quoncy 

1 r> 

2 « 

3 3 

4 2 

5 1 

0 2 

7 1 

8 1 

0 7 


5 

1G 

0 

8 

5 

12 

7 

8 

03 


133 


Mean = 


30 


= 4-4333. 


Total... 


30 


133 
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In other words, the mean of a frequency 
series is the value obtained by dividing the 
summation of the products of frequency times 
the class or variable by the total number of 
frequencies. 


Quadratic Mean 

A mean that is frequently used for various 
kinds of statistical analysis is the Quadratic 
Mean or the Root-Mean Square. It is obtained 
by extraction of the square root of the arithmetic 
mean of the squares of the items contained in 
a series. Thus, if we are to find out the 
quadratic mean of 4 and 5, it would be calculated 
as follows : 


J 4 1±- S - 2 = j 16 + 25 . = i/20'5 = 4-528 ( Q.. if.) 

I 

Because of its use in connection with the 


measurement of standard deviations, the root- 
mean-square ranks as one of the most important 
of statistical averages. 


Geometric Mean 

Sometimes one or two items in a component 
series may have such disparate magnitudes, 
that the arithmetic mean thereof cannot be 
said to represent truly the magnitudes involved. 
In such circumstances, the Geometric Mean 
is of great advantage. The Geometric Mean 
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Of a set of n positive numbers is obtained 
by simple extraction of the nth root of their 
product. In the calculation of the geometric 
mean, the use of logarithms affords great 
facility. The logarithms of the geometric mean 
is the arithmetic mean of the logarithms of 
the items contained in a series. Thus the 
geometric mean of 4 and 5 will be worked 

out as follows : 

G. M. = V 4 x 5 = 4 ' 47 -' 


Or log 0. M. — 


log 4 -i- log 5 
•> 


= \ (log 4 + log 5) 


= 4-472. 


Harmonic Mean 

The llarmoaic Mean is obtained by IU( “'S 
the total number by the sum of the reciprocals 
Of the items. Thus the harmonic mean o 
the example we have previously worked out 
will be calculated as iollows : 



Theorem : 
the quadratic 
metic mean, 

•Reciprocal* aro 
product I* 


In a scries ol' positive terms, 
mean is greater than the wit-li¬ 
the arithmetic mean is greater 

Txprr'Jioi*’ stT ntol to unotliL-r that their 
Thus 1/5 i* the reciprocal ol <>. 
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than the geometric mean, which in its turn 
is greater than the harmonic mean, unless 
the terms are equal in which case the values 
of the four are identical.” 

Median 

We have already seen in connection with 
frequency distributions that when we arrange 
a series of items side by side in order of their 
ascending magnitudes, the range is known 
as an array, and the middle or the central 
item in the array which most closely corres¬ 
ponds to the magnitudes of all other examples 
in the array, is known as the Median. The 
Median is located half-way down the array, 
and for determining its position the following 
formula is used : 

n -f- 1 

“IT 

It should be noted that if the number of 
items in the array be odd, then there is no 
difficulty in locating the Median. For example, 
in the frequency distribution quoted on page 16, 
the median is the 210th item. In examples 
of even number, however, the median is fixed 
midwav between two middle items. For 
example, if the number of items in the frequency 
distribution on page 16 had been 420, then the 



determining 
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median would have been the 210'5th item. 
(In the above formula n denotes the number 

of items in the array). 

For measuring the magnitude of the Median, 
the following simple formula in preference 
to others given in most books on Statistics 

may be used : 

M = L + j 

Where M =the Median; L=the lower limit 
of the class in which the median is located ; 
c=the class interval; f=the frequency o 
the class in which the median occurs; and 
p=the number of items that must be counted 
in the median class to determine where the 


median occurs. 

Thus, with the help of the above formula 
we can compute the magnitude of the median 
in the frequency distribution on page 16 as 

follows : 


M = 30,001 + 


5000 

GO 


X 10 ) = 33,125. 


Mode 

When in an array we meet with a predominat- 
ing group of the same or approximately 
the same magnitude, the predominant group 
is called the Mode or the Norm. It is indeed 
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the item or value which occurs the greatest 
number of times , or the point of the greatest 
density, or of predominant or most fashionable 
value. In graphical representation the curve 
would flatten out at the region of the modal 
group. In the case of a group that is repre¬ 
sented by a continuous series the value is the 
abscissa of the maximum ordinate. The value 
of the Mode can be determined with the help 
of the following formula : 


Mo = L + 


CF 

F -f f 


Where Mo =the Mode; L =the lower limit 
of the modal group; F=the number of fre¬ 
quencies in the next higher class, or in the 
class immediately above the one in which 
the mode is located; C=the class interval; 
and f=the number of frequencies in the next 
lower class, or in the class immediately below 
the one in which the mode is located. 

The mode is less definite in position than 
the median. But as compared with the mean, 
whereas the mean may correspond to no reality, 
the Mode is precisely the number for which the 
most numerous instances can be found. The 
special feature of the mode is that it remains 
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unaffected by any extreme. It is an indication 
of the type from which others are diverging. 

Quartiles 

The Upper and the Lower Quartiles can very 
easily be calculated with the aid of the follow- 
ing formulas : 3 ( „ + , j 

Upper Quartile = 4 

n + 1 


Lower Quartile — 4 


. In a normal distribution both the median 
coincide with the mean. 


and the mode would 



CHAPTER VI 


MEASUREMENT OF SCATTER 

Useful though they are in conveying to us 
an idea as to the character of the mass, the 
averages, however, do not furnish us any 
information as to the way the different values 
of the variable deviate from them. The way 
in which the different values of the variable 
deviate from the average is known as the 
Scatter or Dispersion. 

Dispersion or scatter is measured by taking 
into account the extent to which the items 
representing the values of a variable deviate 
on an average from a standard type or item 
like the average, the median and the mode. 
In statistical analysis we note and record 
both the absolute and relative dispersion of 
a trait or character. 

Absolute deviation is measured by calcula¬ 
tion of the mean deviation of a series. To 
measure the relative dispersion of a trait 
or the ratio of the dispersion to the standard 
type, we use the coefficient (the fraction of 
variation occurring in a group) of dispersion 
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for each of the group under review. This 
is obtained by dividing the absolute measure 
of dispersion used by that magnitude which 
has been selected as representative of the data 
under review and from which the deviations 
have been measured. 

Mean Deviation 

A very satisfactory measure of dispersion in 
many cases is the Mean Deviation, lo calcu¬ 
late it, we first suramate the deviations from 
the standard type, and then divide the sum 
total by the number of items under review. 
In other words, it is merely the simple average 
of the deviations (without any regard for the 
signs). The following table is illustrative of 
this : 


Retail Price of Rice at Rungpur 1919-24 


Year 

Price per 
Md. 

Kh. 

Deviations from the Mean (<i). 

Mean =725 

1019 

7-5 

•25 

Id 

1920 

7-9 

•05 

0= - 

1921 

75 

•25 

n 

1922 

53 

1-95 

4-60 

1923 

0-9 

•35 

- 0-700. 

1924 

8-4 

115 

6 


Total ... 


435 


400 
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The mean deviation from the mode and the 
median are also calculated in the same way. 
If d be the deviations from the average, dM. 
the deviations from the median and dMo the 
deviations from the mode, and n the number 
of observations, the formulas for the determina¬ 
tion of the mean deviation from the average, 
the median and the mode will be respectively 
as follows : I d y %dM, and -dMo.~ The 

n n n 

result thus obtained will give us the absolute 

measure of dispersion, and the three lands oi 
deviations are symbolically represented as 
follows : o, 5 My and o Mo. 

In the case of a frequency distribution, how¬ 
ever, the mean deviation is calculated by 
“determining the difference between the mean 
and the midpoint of each class, signs ignored, 
multiplying this difference in each instance by 
the number of frequencies in the particular 
class and then summating and dividing by the 
total number of frequencies in the distribution.” 

Coefficient of Dispersion 

When comparing the dispersion of one group 
of observations with another we use the 
Coefficient of Dispersion. This Coefficient is 
obtained by simply dividing the average 
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deviation by the mean. To express the coeffi¬ 
cient of dispersion as a percentage multiply 

it by 100. 


Standard Deviation 

Whereas the mean deviation is a measure 
of the extent to which the items in a series 
deviate on an average from a standard type, 
the Standard Deviation, on the other hand, 
is a measure of the extent to which the items 
deviate from the simple average, giving weight 
to extreme deviations. It is the quadratic mean 
of the deviations from the arithmetic mean, 
and is sometimes called the Root-Mean-Square 
Deviation. To calculate it, we square the 
deviation of each item from the mean, summate 
the squares, divide the summation by the numbei 
of observations, and then extract the square 
root of it. Stated simply, it is merely the 
value obtained by extracting the square root, 
of the average of the squares of the deviations 
from the simple arithmetic mean, and can be 
symbolically expressed as follows : 

Yd- 


a = lT 


Where a = standard deviation ; d' = square 
of the deviation from the mean ; and n = the 
number of observations. 
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The following table illustrates the calcula¬ 
tion of the standard deviation of a series :— 

Retail Price of Rice at Rungpur 1919-24 


Year 

• 

Price per Mcl. 
Us. 

Deviations from 
the Mean 
(Mean = 7*25) 

Sq. of 
Deviations 

1919 

7'5 

•2 r» 

•0625 

1920 

7-9 

•65 

•4225 

1921 

7 

•25 

•0625 

1922 

5-3 

1-95 

3-8025 

1923 

69 

•35 

•1225 

1924 

S-4 

115 

1-3225 

Total 

435 

4-60 

5-7950 

a = 

r 5 7950 
v 6 

= V 0-9058 = 

0-982. 


The standard deviation of a frequency series 
is calculated with the help of the following 
formula : 

!f(d 2 ) 

n 

Where f=frequencies; d=deviation of the 
class from the mean of the series; and n =the 
number of observations. 

The application of the above formula is 
shown in the table below : 


Series 

1 

<1 

d‘2 

(Mean of the Series-=80). 

00 

1 

—20 

400 


70 

O 

— 10 

200 


80 

3 

0 

0 

10 

90 

4 

10 



Total ... 

10 

* 

1.000 
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Standard Deviation of Two Groups 

The standard deviation of two groups of 
data is calculated with the method exemplified 
in the following table : 


Group 

T 

dx 

(Dev. 

from 

Mean) 

y 

Group 

II 

dy 

(Dev. 

from 

Mean) 

(x-y) 

(x-y)— 

Mean 
of (x-y) 

l$q. ot dev 
of(x-y) 
from its 
mean 

■ 1 — • • ' 

0 

—7 

10 

—4 

1 

—1-2 

1-44 

12 

— l 

10 

—4 

•> 

—0-2 

004 

14 

_o 

13 

—1 

1 

—1-2 

1-44 

15 

—1 

15 

1 

i) 

_.»-2 

484 

10 

0 

13 

— 1 

3 

03 

064 

IS 

2 

14 

0 

4 

1-8 

3-24 

1!) 

3 

14 

0 

fi 

28 

7-S4 

20 

4 

19 

5 

1 

— 1-2 

l 44 

** *" 

<> 

IT 

3 

5 

2-8 

7-84 


Where x=group 1; dx=deviation from the 
mean of group x ; y=Group II; dy=deviation 

from the mean of group y. 

Mean of Group 1=16. 

Mean of Group 11 = 14. 

Mean of Col. 5=2*2. 

Total of Col. 7=33*60. 


j — .! 


33*60 

10 


= 1*83. 


Coefficient of Variation 

When the standard deviation is expressed 
as a percentage of the average, it is known 
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as the Coefficient of Variation. It is obtained 
by multiplying cr by 100 and dividing the 
product by simple arithmetic mean. The 

formula for it is : 

F=100 j /m. 

Quartile Deviation 

This measure of dispers.on is calculated 
by use of the formula : 

Qu-Ql 

2 

Where Qu=the magnitude of the upper 
quartile; Ql=the magnitude of the lower 
quartile. 

The Quartile Coefficient of Dispersion is 
obtained by application of the formula : 

Qu — Ql 

2 = Qu ~_Q1 

Qu + Ql Qu + Ql' 

•> 

w 

Lorenz Curve 

Devised by Dr. Lorenz this graphic method 
is used for the measurement of divergences 
from the average. Dr. Lorenz used it specially 
for the measurement of the distribution of 
wealth, and it “ takes the form of a cumulative 
percentage curve, combining the percentage 
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of items under review with the percentage 
of wealth or other factor distributed among 
such items.” It is an ideal method for compar¬ 
ing the distribution of profits over various 
groups of businesses. 

Skewness 

Skewness is the distortion of symmetry 
of dispersion of items in a group by any reversal 
on opposite side. This is of the same signi¬ 
ficance as saying that “ skewness ” is merely 
“ distorted normal distribution.” * The formula 
used for the calculation of the Coefficient of 

Skewness is as follows : 

A — Mo 
* = ~ a 

Where s=thc coefficient of skewness ; A=the 
arithmetic mean ; Mo=the magnitude of the 
mode; and i = the standard deviation. 

In other words, it is calculated by subtract¬ 
ing the mode from the mean, and dividing 
by the standard deviation. “ This quotient 
or value thus obtained is an expression of how 
many standard deviations, or what part ot 
a standard deviation, the mean deviates from 
the mode. Whenever the mode is greater 
than the mean there is a minus quantity 

• A normal distribution Huh. tl.oroforo, a Mkcwuoaa of zero. 




54 STATISTICS—HOW TO HANDLE THEM 


when it is subtracted from the mean, indicating 
that the distribution is negatively skewed. 
When the mean is greater than the mode 
there is positive skewness. By merely deter¬ 
mining the difference between the mean and 
mode without subsequent division by any 
measure we have an expression in actual values 
that serves as an indication of skewness. If, 
for example, the value of 6 is the mean and 
the value of 4 is the mode it can be seen 
immediately that the mode is 2 less than the 
mean.” 


Calculation of Probable Error 

Pertinent in this connection are the various 
methods adopted for calculation of the pro¬ 
bable error of the various statistical constants 
as well as the coefficients. For instance, the 
'probable error of the mean is calculated with 
the help of the following formula : 

0*6745 (?) 

P- e * = V n 

In other words, to calculate it we multiply 
the standard deviation by 0*6745 and then 
divide the product by the square root of 
the number of items. “ The probable error 
of the mean is quite commonly used as a 
measure of reliability. Whenever the probable 
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error is no greater than one-sixth of the st andard 
deviation the sample of data may be considered 
fairly reliable, and when it is no greater than 
one-twelfth of the standard deviation it is 
a safe deduction that the sample is sufficiently 
reliable for practical purposes. In chance or 
random selections the probabilities involved 
in sampling are such that in only one case 
in a hundred is there a chance of the arithmetic 
average being inaccurate to the extent of 
more than four times the size of the associated 
probable error. That is, if we were to select 
a sample at random and then calculate the 
mean and the probable error of the mean 
there is but one chance in a hundred that 
the true mean of the entire universe would 
differ from the mean of the sample to the 
extent of more than four times the probable 
error of the mean of the sample.” 


Probable error 
calculated with 
formula : 


of the. .standard deviation is 
the help of the followin 


W 

r* 


0-6745 ( n ) 

P- c ‘ X V 2» 

In other words, to calculate it we multiply 
the standard deviation by 0-0745 and then 
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divide the product by the square root of two 
times the number of items. 

The formula that may be adopted for the 
determination of the probable error of a dis¬ 
tribution is as follows : 

x p. e. d. = 0-6745 ( a ) 

In other words, to calculate it we multiply 
the standard deviation by 0-6745. This is 
as good as saying that “in any other distri¬ 
butions of like or similar data the chance 
are even that one-half of the number of the 
number of items will fall within a range of 
the mean ot the first distribution plus or minus 
0*6745 of the standard deviation.” 


CHAPTER VII 


INDEX NUMBERS 

By an Index Number is meant a value 
expressing the percentage of change taking 
place in the characteristic property of a series 
at items of a time series as compared with 
the level (written as 100) at any given base 
date. They are very widely used m the usi- 
ness field, and are useful in showing the relative 
magnitudes (expressed as a percentage on 
a base period) of changing or changed condi¬ 
tions from time to time. Thus, index numbers 
are applied to the measurement of the general 
movement of prices, cos. of living, wages 
production, consumption, employment, etc. A 
applied to the measurement of price changes 
they are particularly useful in showing tin 
reasons for the fluctuations of prices over a 
certain period. When the change in the |» 
of two commodities under nearly equal co. - 
lions are compared, the common acm 
such a comparison is, of course, fe 

in the purchasing power of money. e 
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the indices in regard to two or several commodi¬ 
ties show similar fluctuations, it is assumed 
that the change in their prices is due to the 
general factor of a change in the purchasing 
power of money, but when there is any sharp 
divergence in a particular group, it is assumed 
that besides the general factor extraneous 
factors or causes are perhaps at work. In 
other words, index numbers are very useful 
in enabling us to measure the size of any hidden 
factor which though not capable of direct 
measurement can, however, be measured by 
taking into consideration of quantities which 
are influenced by such a factor. 

The method that is generally adopted for 
the framing of an Index Number is to select 
the average or the actual value or magnitude 
of an arbitrary period as the Base (or equated 
to 100), and calculate the values or magnitudes 
of the subsequent periods as percentages on 
same. As applied to price indices, the problem 
concerned is of the same significance as that 
of the comparison of the purchasing power 
of a rupee in one year with its purchasing 
power in another. The series of proportional 
numbers obtained constitute what is known 
as the Fixed Base Numbers. 
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Chain Base Numbers 

Sometimes, however, we frame what are 
known as the Chain Base Numbers (also known 
as the Link Index Numbers). A different 
method is, of course, adopted lor this. In 
this case, we do not take the value or magnitude 
of any arbitrarily selected period as the base, 
but the average or the actual price ot immedi¬ 
ately preceding period as the base (100) and 
calculate the Index Numbers of the period 
under review as a percentage on same. An 
advantage of the Chain Base Method is that 
it enables a comparison possible with modern 
conditions, whereas in case of a Fixed Base 
Index the results are likely to be distorted 
or nullified if the Base be too old. When 
the year to year movements are ol more 
interest than the change over a longer 
period of time, this method is ol definite 

advantage. 

The following is an illustration of both 
Fixed Base Numbers and Chain Index Numbers 
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of First Grade Jute Prices in Calcutta during 
February to December 1941 :— 

V 

Index Numbers of Jute Prices 


1941 

Price 
Rs. as. 

Fixed Base No. 
(July 1914= 100) 

Chain Index 
No. 

February 

... .12-8 

47 

100 

March 

... 37-0 

54 

110 

April 

... 35-0 

51 

95 

May 

... 44-8 

64 

127 

June 

... 48-8 

70 

108 

July 

... 51-0 

74 

106 

August 

... 66-0 

96 

129 

September 

... 69-0 

100 

104 

Octobor 

... 61-8 

89 

89 

November 

... 63-0 

91 

102 

December 

... 53-0 

77 

84 



ite Index Numbers 


When we construct an Index Number with 
the values or magnitudes of two or more things 
or commodities in it, it is called a Composite 
Index Number. Such an index number may 
be either of the weighted or unweighted 
character. To obtain an unweighted composite 
index number we add together the values 
or magnitudes for the base year, then divide 
the summated total for each year by the 
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total for the base year, and then multiply 
by 100. To obtain a weighted composite 
index number, we take the weighted average 
or actual values for the base year, and proceed 
to calculate as above. The most familiar 
type of weighted composite index number 
is the Cost of Living Index Number. This is 
illustrated by the following table : 


Cost of Living Index on the Basis of Retail 
Prices Prevailing on the 23rd September 

1942 


For working class in 

and around Calcutta 


Weight • 

Pre-war 

Present 

Weighi¬ 

ng® 

Items 

ugo 

Index 

Index 

Present 

Index 

(n) Foodstutt— 

Rico 

24 

100 

202 

4,84* 

Alta k Flour 

11*3 

100 

ISO 

2,102 

Dal 

6-7 

10U 

18!) 

1,200 

Gheo 

7-7 

100 

105 

1,271 

Oil 

• > 

100 

131 

655 

Halt & Spices ... 

5-3 

100 

207 

1,007 

Sugar 

5 

100 

125 

025 

Tea 

1 

100 

100 

100 

Milk 

9 

100 

100 

ooo 

Vegetables k 

Fish 

. 2.7 

100 

100 

2,700 




Index .... 

15,364 

174 
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Items 

Weight- 

ago 

Pre-war 

Index 

Present 

Index 

(l>) Fuel & Lighting- 

Kerosene Oil ... 

35 

100 

171 

Coal & Fire¬ 
wood 

01 

100 

231 

Matches 

4 

100 

180 

(r) Coarse Cloth 

100 

100 

Index. 

• • • 

(./) and (<■) Miscellaneous & House Rent — 

Constant. 

Composite Index 

(u) Food 

52-5 

100 

154 

(6) Fuel & 

Lighting ... 

7-5 

100 

208 

(r) Coarse Cloth 

7 

100 

198 

(ci) Miscellane¬ 
ous 

19 

100 

100 

(r) House Kent 

14 

100 

100 



Cost of Living Index 


Weight- 
ago 

Present 

Index 


5,985 

14,091 

720 

20,790 

20S 

198 


8,085 


1,500 

1,380 

1.900 
1,400 


14.331 


143 


Ill constructing a Cost oi Living Index 
as above, we first determine the class of people 
(e.g., industrial workers, artisans, clerks, etc.) 
for which the index numbers are to be com¬ 
piled. We then collect a reasonably adequate 
number of sufficiently accurate samples of 
family budgets from the class under review. 

, The period chosen for such budgets being 
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one of normal conditions, that is to say, free 
from abnormally high or low prices. The 
proportion of expenditure on different articles 
or objects by an average family is then deter¬ 
mined, and the retail price quotations of 
these articles are collected from standard trade 
journals or municipal gazettes or official publica¬ 
tions. If the quotations are obtained weekly 
and the cost of living index number is to be 
calculated for the month, the weekly prices 
are averaged into monthly figures. To start 
with, these monthly figures would form the 
base or 100, and similarly calculated figures 
for subsequent months would be represented 
as percentages of the prices of the base period. 
These percentages or index numbers for the 
particular commodities are then multiplied 
by their respective weights ’ which represent 
the relative importance (the proportion which 
expenditure on each article bears to the total 
expenditure on the group) of the respective 
articles in the family budget. Then as shown 
above, index numbers of various groups ot 
articles are arrived at by process ot summa¬ 
tion. These index numbers are then multiplied 
by the weights or the relative importance 
of the various groups of items, and by summat¬ 
ing them we arrive at the composite index 
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number which forms the Cost of Living Index 
for the period under review. 

To arrive at a correct Cost of Living Index 
Number, we must be careful in regard to 
demarcating the class of people under enquiry, 
correct selection of representative articles enter¬ 
ing into the cost of living of the class under 
review, collection of reliable price quotations, 
accurate assignment of weights, and the changes 
in demand of various articles or their prices 
in the period under consideration. 

Indices of Industrial Activity 

These are useful in showing the change 
in the industrial production of a country 
over a period of time. These index numbers 
are prepared with the help of output ligures 
of the different industries. The one prepared 
and published monthly by the Capital of 
Calcutta has 1935 for its base year, and the 
following series (with the respective weights 
within brackets) for its components : 

I. Industrial Production —Cotton Manufac¬ 
tures (9), Jute Manufactures (G), Steel Ingots 
(5), Pig Iron (S), Cement (5), Paper (3). 
II. Mineral Production —Coal (7), HI. Rail 
& Itivcr-borne Trade (24). 1V. Financial 
Statistics —Cheque Clearances (20). V. Trade 
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Foreign <1- Coastal —Exports (4), Imports (3). 
VI. Shipping , Foreign d Coastal —Tonnage 
entered (3), Tonnage cleared (3). 

Indices of Business Conditions 

To show changes in the business conditions 
of a country a wider range of data is required, 
and Professor Pigou selected the following 
series for a study of the changes in the business 
conditions of England : 

(i) Unemployment percentage, (it) Con¬ 
sumption of pig-iron. (Hi) Prices in England. 

(iv) Rates of discount on 3 months’ bills. 

(v) Volume of manufactured goods. 

(vi) Agricultural production. ( vii) Yield 
per acre of nine principal crops. ( viii) Index 
of production from mines, (ix) London 
Cheque Clearings, (x) Increase of Bank 
Credit, (xi) Credits outstanding, (xii) Annual 
increase in the aggregate money wage. 
(xiii) Rate of real wages, (xiv) General aggre¬ 
gate consumption, (xv) Proportion of Reserve 
to Liabilities of the Bank of England. 


CHAPTER VIII 


CORRELATION & PREDICTIVE EQUATIONS 

It is often noticed in the business field 
that there is a distinct relationship between 
certain allied sets of phenomena. This rela¬ 
tionship is of the nature of cause and effect. 
Thus, there is found to be such a relationship 
between decline in production and rise in 
prices, employment and wholesale commodity 
prices, great industrial activity and high prices 
of equities, and so forth. The mathematical 
theory by means of which these relationships 
are found and reduced to formula and number 
is known as Correlation. The theory of corre¬ 


lation implies two or more sets of variables— 
those that cause or influence certain changes, 
and those that are the results or effects of 
such changes. The series producing the causal 
factors {e.g.y the output of a commodity) are 
generally known as independent variables 

(because the 
pendent of the other), and the series of factors 
which are the results or effects (e.g.y the price 
of a commodity) as the dependent variables 
(because they are dependent for their changes 
upon the factors of the other series). These two 
sets of factors are respectively termed as .rvalues 


changes produced by it are inde- 
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and y values. Consequently, if there exists 
a causal relationship between output and price, 
the output is referred to as x and the price 
as y. Again, when the changes in the associated 
sets of phenomena are in the same direction 
the correlation is called positive ; when, however, 
it is of an inverse nature it is called negative. 

Correlation can be studied from two or 
more aspects of a thing. Thus, for instance, 
when we study the relationship between the 
acreage and price of a crop, we call it a study 
in the correlation of two factors. Again, when 
we study the correlation between the acreage, 
yield, and the price of a crop we call it a correla¬ 
tion of multiple factors. 

Coefficient of Correlation 

For numerical measurement of the actual 
magnitude or degree of correlation that exists 
between two or more associated sets of pheno¬ 
mena we calculate the Coefficient of Correlation. 
The Correlation Coefficient is calculated by 
such methods that when perfect correlationship 
exists between the factors under review it 
has a value of 1 (wherefore the sign k ‘ 1 is 
used for this), and when there is no such 
correlation the sign “ 0 ” is used. When 
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the correlation is ot positive character, that 
is when it shows the deviations of both the 
factors in the same direction a plus (+) sign 
is used as a prefix, and when it is negative 
that is to say when the trends are in opposite 
direction a minus (— ) sign is used. 

Under actual conditions however, we seldom 
find a correlation that measures as great as 
l, so that the value of the coefficient or correla¬ 
tion (represented by the sign r) is generally 
expected to be some fractional part of 1. The 
higher the value of the numeral in the coefficient, 
the greater is the degree of correlation. 
Whether this will be -f or — quantity would, of 
course depend upon the direction of correla¬ 
tion. In this connection, the following rules 
as (riven bv W. I. King in his Elements of 
Statistical Method for the interpretation of 
coefficient are to be noted : 

1. If the coefficient is less than the 

probable error there is no evidence 
whatsoever of correlation. 

2. If coefficient is more than six times 

the size of the probable error the 
existence of correlationship is a 
perfect certainty. 
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3. When the probable error is relatively 

small, it coefficient is less than 
0’30 the correlation cannot be con¬ 
sidered at all marked. 

4. If i he probable error is relatively 

small, a coefficient above 0*;>0 
indicates decided correlation. 

PBarsonian Coefficient 

Karl Pearson's method for the measure- 
ment of biological correlation is very useful 
as well for the measurement of long-term 
fluctuations in the business world. This is 
also known as the ‘ Product-Moment Correla¬ 
tion.' For this method we use the following 

formula : 

£(x-A)(y-») 

n a x ay 

Where r=the coefficient of correlation ; x- 
the independent variables; A=the mean ot 
the independent variables ; y=the dependent 
variables; a=the mean of the dependent vari¬ 
ables; n=the number of items under review ; 
a x=the standard deviation of the independent 
variables; and a y=the standard deviation 
of the dependent variables. 
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The application of this formula is illustrated 
by the following table : 


Correlation Between Production & Price of Jute 


Year 

Produc¬ 
tion (x- 

(*) 

-A) 

(x-A) 

Price 

2 (y) 

(</—a) 

('J —a)‘» 

(a—A) 
(y-«) 

1931 ... 

102 

14 

196 

41-00 

9-00 

31-00 

120-0 

1932 ... 

00 

.*)•) 

484 

25-25 

—0-75 

45-56 

148-5 

1933 ... 

88 

0 

0 

24-25 

—7-75 

00-00 

0 

1934 ... 

88 

u 

O 

29-75 

—2-25 

5-00 

0 

1935 ... 

98 

10 

100 

37-75 

5-75 

33-00 

57-5 

1930 ... 

86 

_ •) 

4 

34-00 

2-00 

400 

—40 

Total ... 



7 84 



228-74 

328-0 

Standard Deviation 


ax = 

11-40 

ay * o 

•24 



~ (x—A) (y—a) 

3 

28-0 



t 











*** 




r /U;» 


» J X 3 y 


6 

'■ 11-40 

a fl-24 




Concerned as it is mainly with the measure¬ 
ment ot long-time relationship, the Pearsonian 
(Coefficient would prove unsatisfactory if our 
interest lies in short-term changes. For 
instance, if we were examining two long-term 
series that show positive trend for the period 
as a whole and only year to year negative 
trends, the Pearsonian method would not take 
into account these negative trends for the 
shorter period. 





CORRELATION & PREDICTIVE EQUATIONS 7! 

However, with ft little modification, the 
Pearsonian method can be adopted for the 
measurement of correlation oi short-term 
fluctuations. In this connection, we do not 
use the deviations of the dependent and the 
independent variables from the arithmetic mean. 
Instead, we use the deviations of dependent and 
independent variable! from the Trend. For this 
purpose, we calculate the moving averages ot 
the Index Numbers of the two factors, and take 
the deviations of these factors from the mean 
of the indices as the basis for the standard 
deviation in each case. Then the Pearsonian 
formula is applied to it. 

Probable Error of Coefficient 

We have already seen that for the inter¬ 
pretation of r, it becomes necessary to calculate 
the probable error of the coefficient of correlation. 
This becomes particularly necessary when a 
large number of representative cases are taken 
into account. The formula employed for the 
calculation of the probable error of the coeffi¬ 
cient of correlation is as follows : 

0*6745 ( I — r ) 

p. e. r = - 

\ n 

Where p.e.r. = probable error of the coefficient 
of correlation ; 7**=square ot the coefficient 
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of correlation; n=total number of items used 
in the calculation of the coefficient. 

In other words, we subtract the product 
of the squares of the coefficient of correlation 
from unity (1), and multiply it by 0*6745, 
then divide the result by the square root of 
the total number of items used in finding 
out the coefficient. (For values of 1—r 2 consult 
Tables for Statisticians and Biometricians. 
Table VIII). 

It may be remarked that the probable 
error of correlation ratio and the correlation 
index are also calculated from the same formula 
as above, the proper measure being substituted 
for r. 

Time Lag 

In the study of correlation it is often found 
that changes in one series are not immediately 
followed by changes in another. In other 
words, there is noticed a time lag of several 
months or even of a year or more, before 
the independent factors are found to exert 
their effects upon the dependent factors. In 
calculating the coefficient of correlation between 
such series “it is a good plan to plot each 
one- separately and then determine the appro¬ 
ximate time lag by shifting back and forth.’* 


CORRELATION & PREDICTIVE EQUATIONS 73 

This can be easily done by holding the charts 
before a light and moving them back and forth 
until the position of greatest relationship is 
determined). 

Another method of “ shifting ** the data 
is illustrated- by the following table inter¬ 
polated from Kuznet's Variations in Industry 
6c Trade (National Bureau of Economic 
Research, 1938) giving the indexes of seasonal 
variation of production and shipments of pneu¬ 
matic castings in U.S.A. for the years 1923-31 


Month § i i i i s -3 i g- i i i 

Index-Ship- ^ ^ < 'P. •-* ^ ^ C C 

mcnta(X) 8‘J 81 08 110 113 120 129 122 104 87 72 74 

• ■ • • » ■ —~ ■ — MW- - —, M ^ ^ ^ 

I lulox •Pro¬ 
duction 

(Y) ... llti 101 1U 113 IKi 112 97 104 91 91 82 83 


Shifting the indexes for the production one 
place to the right, we get the new series, as 
follows :— 


X 

... 89 

81 

98 

110 

113 

120 

129 

123 

104 

87 

72 

74 

Y 

... 83 

90 

101 

114 

113 

11G 

112 

97 

104 

91 

91 

82 

and ho 

on. Again, shifting to 

tho left wo huvo the series:- 



X 

... 89 

81 

92 

110 

113 

120 

129 

125 

104 

87 

72 

74 

Y 

...101 

114 

113 

no 

112 

97 

104 

91 

91 

82 

83 

96 


Correlation Table 

When dealing with a large number of items, 
it is advantageous to calculate the coefficient 




Index of Wholesale Commodity Prices (1919—32). 
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of correlation (signified by the letter r) with the 
help of a table known as the Correlation Table. 
This is illustrated by the following table : 



fiUit) JU3M.{o|chugjo vopnj 
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The above is an example of the construction 
of a correlation table for determining whether 
or not a net relationship exists between whole¬ 
sale commodity prices and employment, using 
the United States Bureau of Labour Statis¬ 
tics index of wholesale commodity prices and 
the index of employment from 1919-1932. 
The first step in the construction of such a 
correlation table is to divide the range of the 
two variates into convenient divisions. Hen 
the commodity prices range from 104*5 to 02*ft 
and the employment index from 110 9 to 55*2- 
It will be convenient to divide the range foi 
both scries as follows: for the former nine 
divisions of five units each and lor the latter 
twelve divisions of five units each, lhe 
respective frequencies will then be grouped 

as above. 

A convenient method that is usually followed 
for the calculation of the coefficient of 
correlation from a correlation table is shown 

below :•— 



Index of Wholesale Commodity Price3 (1919-1932) 



(r.Kfll—6IT.1) viouiXoiduifl jo xapuj 
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tl B —x and y = class marks ; f = total fre¬ 
quencies of columns ; and g = frequencies ot 

rows. 


215 . _/v|n 

Air v— „ =1-7917, 
x _Av. x- 12 q 


206 __ 1.7167 
y-Av.y- 120 


| 1111 = vTToXSl =2-4593. 

« = rl — 120 — 1 ' 

120 

3V _ U350 7i 67) * = ^3519 = 2-8901. 

3y ~ f 120 ' 

H 1-79171 (1 ' 7 ‘ 67) _ =0-9490 

T = —(2 4593)' (2-8901) ~ 7 ' 1076 

mean ot Several Values of r 

To obtain the .«*!!• ■■ »" »' 

r-. 

11m. and the .... MM * * ; 

.i« .t c* "• rr*.- 

I 1 i, illustrated 
by conversion ot z 11110 • 

by the following table : 
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Calculation of Mean of 8everal Values of r. 

If the same two variables are correlated 
in three groups, the numbers in the groups 
and the values of r being as follows : 


Group 


N 

r 

I 


13 

0-30 

II 


38 

0-40 

III 


43 

0-35 

then the mean 
follows :— 

values 

can be 

calculated 

r 


xV-3 

(*-3)2 

030 

0-310 

10 

3-100 

0-40 

0-424 

35 

14-840 

0*35 

0-365 

40 

14-600 



85 

32-540 


For 2=0-382, r=0*364. Hence the average 
correlation in the three groups is 0*382. 

Partial Correlation 

We have so far studied simple correlation 
or the extent of co-variation between two 
variables. But a look at the economic or 
business situation will convince any man that 
most of the economic phenomena are influenced 
by a variety of factors rather than by one 
alone. Price of jute, for instance, may be 
due not merely to the acreage planted, but 
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also to the yield per acre, which in its turn 
is due to a multitude of factors such as labour, 
temperature, rainfall, fertilizer and irrigation. 
Similarly, in the Stock Market it has been 
perceived that long-term bond prices “ respond 
to such situations as (») changes in the cost 
of living, since the public regards bonds as 
relatively undesirable in protracted periods 
of rising prices, and vice versa, (2) the earnings 
applicable to the interest charges, the bond 
price responding to variations m earnings which 
threaten or fortify the coupon payments, 
and (3) other interest rates, which influence 
the height at which bond prices will capitalize 

their coupon payments. 

In the held of chemistry or any other science 
of similar nature, it is easier for two or more 
causal factors of a phenomena to be isolated 
and analyzed. Such isolation and analysis an 
not, however, possible in the field of commerce 

and industry, as the multitude of facto, s 
producing their effects upon a phenomena 
can neither be controlled nor the relative 
importance of these factors properly assessed. 
Thus, for instance, if we were to stu< y t < 
influence of variations in money rates on 
business conditions, we cannot tor the purpose 
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of our experimentation, keep the money rates 
inert and stationary for a period of years 
and then proceed to measure what effects 
that would produce on commerce and industry. 
This lack of control over economic factors 
is, however, made good by the statistical 
methods of Par Hal and Multiple Correlations. 

In the study of correlation of this type, 
we measure the correlation of one set of inde¬ 
pendent variables with another set of dependent 
variables keeping the effects of any other set 
of independent variables constant, fixed or 
eliminated. This kind of correlation technique 
is known as the Partial Correlation. When only 
three sets of variables are under review all 
that we are concerned with in the study of 
such correlation is to determine the correlation 
between x and y denoted by the symbol rxy 
between x and 3 denoted by the symbol rxz, 
and between y and 3 denoted by the symbol ryz. 
If, for instance, the influence of z is eliminated 
or its effects or influence kept fixed and constant, 
then the correlation between x and y will be 
calculated with the help of the formula given 
below : 

rxy—rx z. ryz 

>Xy.Z — ^ (I — *** xz ) (|_ r 2y Z )' 
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As explained above, read rxy.z as 'the 

correlation between x and y with the relation 

of z eliminated or considered fixed and constant. 

This technique can as weU be applied to four 

or more variables, and the formula to be used 

in that ease will stand as follows : 

rv v. z—rxm. z. J-ym-K 

rxy.zm= ;i (1 J^xm.z) (1-r ym.z) 

Here m stands for the fourth set of variables, 
and the problem involved is that of correla¬ 
tion between x and ./ beeping both = and m 
constant, and it requires the calculation of 
three other partial correlations, each keeping 
only one variable constant. 

Calculation ot Partial Correlation 

[f for instance, we have three sets oi data 

, ,,’and r under examination, and the correla¬ 
tion of ty, *2 and yz are respectively as follows :- 

rxy = 0’50 
rxz = 0-40 
ryz = 0*60 

then the correlation between x and y } keeping 
z constant will be calculated as follows : 

0*50—(0*40 x(V60)_ _ 
nty.7> = 7jJZ(H6) (1-0*36) 

0 50-O'24 _ 0-26 
a/0’84 X 0*64 0*7328 


= 0-35. 
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Multiple Correlation 

In the study of partial correlation our problem 
was to determine the relationship between 
two variables x and y, keeping the third variable 
2 constant. In multiple correlation, on the 
other hand, our problem is to measure the 
combined effect of two or more independent 
variables upon the dependent variable. 
Calculation of the coefficient of multiple 
correlation becomes easier if the following 
formula is resorted to : 

1 —r x (vz)=(l — r xy) (1 —r xz. y). 


After finding out the value of 1—rx(yz), 
we simply subtract the result from 1 and get 
the value of r 2 x(yz), and then by extracting 
the square root of same, we get the value 
of rx(yz) which expresses the coefficient of 
multiple correlation. 

(It should be noted again in this connection 

that a table giving the values of (1—r 2 ) for 

all values of r to 3 places of decimals may be 

found in Tables for Statisticians and Bio- 

metricians, and also in J. R. Miner’s Tables 

• 

of VI— r~ and 1—r for ■use in Partial Correlation 
<fc Trigonometry , Baltimore, 1922, pp. 49). 
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Probable Error o? Partial & Multiple Correlation 

The significance of the partial and multiple 
correlation coefficients can be easily determined 
by means of their probable errors, which 
arc symbolically expressed by the following 

formulas : 

Probable Error of three variables 

0-6745 (1-rxy.z) 
p. e. of rxy. '/< ^ 

Probable error of four or more variables 

0-6745 (1-rx y. zm) 
p. e. ol rxy. zm ^ u 

Probable error of multiple correlation 

0-6745 (l-rx(yzm) 
p.e. of rx (yzm) = 7n 

In all the three cases n is the total number 
of cases used in the problem concerned. 

Coefficients ot Regression 

Associated with correlation is the study ol 
Regression. The term Regression is borrowed 
from biology. As applied to economic or 
business statistics, it means the tendency exhi¬ 
bited by cither of correlated series to revert 
or regress toward its characteristic type. The 
study of regression is useful for predictive 
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purposes and the degree of relationship that 
exists between correlated items in the two 
series having a tendency toward reversion 
or regression, is shown by the line of regression 
which indicates the law of change in the mean 
of one variable for unit change in the other 
and if the line is straight the regression is 
said to be linear. 

To calculate the ordinates of the regression 
line, the simple arithmetic mean of each of 
the two series are determined, as well as the 
coefficient of correlation between the correspond¬ 
ing items in the two series. The angles these 
lines make to the horizontal and vertical 
respectively are measured by the expressions : 


r 


11 

j x 



a x 


r 


3.V 


♦ 


and these are called the coefficient of regression. 
The former of these expressions means the 
regression coefficient of y on x , and the latter 
that of x on y. The values of ?/, the dependent 
most likely to be associated with given values 
of :r. the independent, are obtained by employ¬ 
ing the ordinary formula for straight line 
relationships, 


Y — a-f -bx % 

in which y is the ordinate of the straight line 
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of average relationship, a the arithmetic average, 
b the coefficient of regression and x the distance 
on the x-axis from the point where the mean 
of the x items coincides with the mean of the 
y items. The standard deviation is calculated 
either by squaring the deviations from the 
mean, summating, and dividing by the number 
of deviations, and extracting the square root, 
or by squaring the actual items, sum mat ing, 
dividing by the number of items, subtracting 
the square of the mean, and then extracting 

the square root. 


In this connection, the significance of the 
equations 

a 3 x 

and6 = r ay 

should be carefully noted. They are not inter¬ 
changeable, as one might confuse them to 
be. In the first equation, it means that 
for each change of one unit in the x 01 
independent variables, from the average change, 
there is a corresponding opposite change in the 
y ” or the dependent variables ot a specified 
per cent (represented by the regression 
coefficient) of one unit. It gives the slope 
of the line of highest liner relationship between 
the two series. In the second equation, it. 
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means that for each change of one unit in the 
“ y ” or dependent variables, there is a corres¬ 
ponding opposite change in the “ x or indepen¬ 
dent variables of a specified per cent 
(represented by the particular regression coeffi¬ 
cient in this case) of one unit from the average 


year-to-year change. 


In the example cited in connection with 
the calculation of Pearsonian Coefficient ot 
Correlation on page 70, the coefficient of 
regression of y v on x,” for instance, will 

be calculated as follows :— 



= 0*769 


6-24 

11*40 


0-421. 


Predictive Equations 

If it be realized that the line of regression 
is a means for determining the most probable 
value of the dependent variable when the 
size or magnitude of the independent variables 
are known, it would then be very easy to follow 
that the regression equation can as well be 
appropriately used for the purpose of predict¬ 
ing the future trend of any dependent series 
of variables. But a question that may be 
pertinently asked in this connection is whether 
the actual conditions will coincide with the 
ordinate of a calculated trend of relationship 
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or the ordinate of the regression line ? There 
may be as a point of fact, considerable deviations 
from the ordinate of such calculated trend 
of relationship. With the help of the formula 
employed for the calculation of the coefficient 
of regression (which is an essential factor 
in predictive equation) we can indeed improve 
the value of the estimates, if the following 
formula be used : 

Y = A if—b Ax -f bx 

For instance, in a problem in the estimate 
of the future yield of a crop on the basis of 
current price, we may substitute the following 
values for the symbols used in the above 

equation : 

Y = the percentage change in acreage. 

Aif = the average ol percentage change in 
acreage. 

I, = the coefficient of regression of acre¬ 
age on price. 

x = the percentage change in price. 

Ax -- the average of percentage change 
price. 

i y v 

(For I) use the formula r 


in 
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In other words, we simply subtract from 
the average percentage changes in the acreage 
the product of the coefficient of regression 
and the average of percentage change in the 
prices and the product of the coefficient 
regression and the percentage change in price. 
The method is illustrated in the subjoined 
table : 


Year 

Act uul 
\oroaeo 

0. 

o 

Change of 
Acreage 

Mid- 

December 

Price 

Rs. 

Estimated 

0/ 0/ 

% /O . 

Change of Change in 
Price Acreage 

Y = Ay 
—bAx +bz 

1027 

36-30 

24-04 

03-50 

3-24 

1-65 

1928 

33-71 

—7-10 

06-50 

4-72 

1-32 

1920 

31-31 

—711 

54-50 

-1800 

—1-00 

1030 ... 

3317 

3-92 

29-75 

-45-41 

—7-63 

1031 

34-80 

5-10 

41-00 

37-81 

0-90 

1032 ... 

18-02 

-40-50 

25-50 

3804 

600 

1933 ... 

21-43 

15-05 

24-25 

-4-80 

—1-29 

1034 

25-18 

17-47 

29-75 

22 07 

2-62 

1035 ... 

2011 

3-71 

37-75 

26-88 

3-56 

1930 ... 

21-81 

—16-43 

34-75 

-7-95 

—0-61 

1037 

28-22 

29-42 

34-25 

-1-44 

—206 

1038 

31-60 

1215 

33-75 

—1-46 

—204 

Total 

342-38 

30-43 

475-25 

54-22 


Mean 


* 3-35 


4-52 
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Our problem here is to estimate the acreage 
of jute crop for 1939, and be it noted here 
for that year alone and not for any othei 
year in the series. In the last column are 
shown the figures of the estimated percentage 
changes in acreage obtained with the help 
of the predictive equation ) = Ay-bAxA- x. 
To apply this equation we had to cal¬ 
culate the coefficient of regression of jute 
acreage from 1927 to 1938 on that of mid- 
December prices from 1920 to 1938 (the price 
for 1926 was Its. 01-50). The coefficient ot 
regression in this case is 0 22. This is the 
value for “6” in the equation. The value 
for Ay is the mean of column 3 in the table 
above, and that for Ax is the mean of column 5. 
Multiplying this mean of percentage change 
by the coefficient of year to year percentage 
change in price as shown in column 5. Once 
we have thus found out the estimated per¬ 
centage change in acieage (the figures relate 
to the next year) it is now easier to work 
out the actual probable acreage by simple 
multiplication of the acreage of any year 
immediately preceding the year lor which 
the acreage is to be estimated by the 
percentage change and then adding or 
subtracting the product, as the case may 




90 STATISTICS—HOW TO HANDLE THEM" 

be. This is illustrated by the following 
table :— 


Year 

Estimated 
Change in 
Acreage 

Estimated 

Total 

Acreage 

Actual 

Acreage 

1928 ... 

i *65 

36-89 

33*71 

1929 ... 

1*32 

34-16 

31-31 

1930 ... 

—1*60 

30-81 

33-17 

1931 ... 

—7-03 

30-64 

34*81? 

1932 ... 

3*96 

3G-93 

18-62 

1933 ... 

6-00 

18-73 

21-43 

1934 ... 

—1-29 

21-17 

25-18 

1935 ... 

2-62 

25-75 

26-11 

1936 ... 

3-55 

27-03 

21-81 

1937 ... 

—0-61 

. 21-68 

28-22 

1938 ... 

—2-03 

27-64 

31-66 

1939 ... 

—2-04 

31-02 

31-60 


Thus the estimated total acreage for 1939 
according to the predictive equation Y=Ayr 
-bAx~\-bx is 3T02 lakhs acreage as against 
the actual acreage of 31*60 lakhs acres, showing 
thus a variation of only 1*8 per cent from the 
actual. 

The slight variations that are noticeable 
between estimates and actual yields are indeed 
negligible tor the purpose. It should be noted 
that where multiple correlation comes into 
the problem, the value of the predictive equa¬ 
tions becomes doubtful. In this connection 
Harper has very rightly observed : “ The value 
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of multiple regression in forecasting is probably 
sometimes over-emphasized, particularly in 
regard to strictly economic data." 

Standard Error of Estimates 

You must have noted by this time that 
although it may not invariably be the case, 
yet generally speaking the extent oi accuracy 
of any estimates or predictions “ vary directly 
in accordance with the magnitude of the 
coefficient of correlation.' For instance, it 
there be “abnormally large variables in one 
series not compensated for in another we 
may obtain a high expression of relationship 
which is not a true index of actual cause and 
effect." Where therefore, there is not much 
consistency of relationship, the error oj esti¬ 
mates is used to determine the ” extent to 
which estimated or predicted values deviate 
from actual values." “To calculate it, we simply 
extract the square root of the mean-square* 
deviation of estimated or predicted values 
from the actual." “ The standard error thus 
calculated may be taken to indicate that 
if the curve of estimates were normal a distance 
equal to the standard error measured horn 
either side of the mean of estimates would 
include approximately 68*26 per cent ot all 
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the estimated values. When the standard error 
of estimate is larger than the standard devia¬ 
tion of actual values the estimating or predictive 
equation cannot be relied upon.’ 

An alternative method for determining the 
error of estimates is by the Coefficient of Aliena¬ 
tion. This is based upon “ the mean of squared 
deviations of actual items from their arithmetic 
mean and on the mean of squared- deviations 
of estimated or predicted values from the 
actual values, and it involves the calculation 
of both the standard error of estimate and 
the standard deviation." The formula employed 
for this may be stated as follows :— 

co=(SEy 2 )/(SDy 2 ) 

where ca=the coefficient of alienation; SEy z 
=the square of the standard error of estimate; 
and SD\f =the square of the standard devia¬ 
tion of actual values. 

Since the standard deviation is merely 
the root-mean-square of the deviations from 
the arithmetic mean of the actual data, and 
since the standard error of estimate is simply 
the root-mean-square of the differences between 
the actual and estimated or predicted values, 
it necessarily follows that when the square 
of the standard error is divided by the square 
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Of the standard deviation a quotient is obtained 
that represents the root-mean-square of the 
differences between the actual and estimated 
or predicted values in terms of the root-mean- 
square deviations of actual values from then- 
arithmetic mean. Merely dividing the standard 
error by standard deviation would give a 
quotient of little or no significance. Tins 
is because both the standard error and 
the standard deviation are derived from the 
means of squared deviations, and not 
simply from the deviations as such, t is 
no more than a truism to state that squares 
of numbers do not increase in the same propor¬ 
tion as the magnitudes of the numbers 
themselves! The square of 12, for example, 
is 144, whereas the square of 17 is 289, and 
289 is more than twice the size of 144, and 
although 17 less than one-half time the size o 
12." (Harper, Elements of Practical Statistic*.) 



CHAPTER IX 


BUSINESS FORECASTING 

The basic theory of business forecasting 
rests on the belief that in normal circumstances, 
there is a definite cycle of ups and downs 
of business activity. Such ups and downs 
of business activity fall under four distinct 

categories : 

I. Secular or long-period trends, caused 

by : 

(?’) growth of population, 

(ii) improvements in methods of 
production, 

(Hi) exhaustion of natural resources, 

(iv) decline in demand due to 
change in habits, 

(v) competition of substitutes, 

(vi) efficiency of management, and 

(vii) change in the purchasing 

power of money. 

II. Cyclical movements, caused by : 

(i) maladjustment of economic 
activity, 
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(«) psychology of business commu¬ 
nity, 

( Hi) monetary conditions, and 

(iv) political conditions. 

;(II. Seasonal variations, caused by : 

(i) the difference in the length of 
month, 

(it) holidays, and 

(in) the difference in weather 
conditions, j 

IV. Accidental detriments, caused by : 

(i) wars, 

(it) strikes and lock-outs, 

(it'i) natural catastrophes, e. {/., 
floods, earthquakes etc., 
and 

(iv) changes in management or 
policy. 

Of these four categories of ups and downs 
m the business field, no forecasting is possible 
in regard to accidental detriments—insurance 
being the only safeguard in the matter. Sea¬ 
sonal variations move with greater regularity 
than any other movements, and forecasts about 
them are easy with detailed weather and crop 
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reports. Scientific forecasting concerns itself 
only with the forecasting of cyclical movements, 
so that with a knowledge of such forecasts 
the effects of secular trend may be eliminated, 
and the ntensity of trade cycle itself may 
be reduced and the range of price fluctua¬ 
tions lessened. Various control measures are 
adopted for this, the most familiar among 
them being the regulation of credit and currency 
policy. This is usually done through the central 
banking institution. 

Relevent statistical series generally employed 
for the purposes of forecasting sequences or 
those of a trade cycle are the figures of produc¬ 
tion in relation to costs and stocks, bank 
clearings, bank deposits, railway goods traffic, 
electric power production, wholesale prices, 
employment and unemployment, export and 
import trade, pig iron production, business 
failures, wholesale and retail sales, stock 
exchange transactions, money rates and the 
cost of living index. In the selection of 
such data the following qualities of these 
should be kept in mind : representativeness, 
reliability, sensitiveness and frequency of publi¬ 
cation. Besides, the information must be 
up-to-date. 


business forecasting 
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Various series just named for the measure¬ 
ment of business activity are considered and 
examined not only as single series, but composite 
indices are also constructed for the purpose. 
Jn this connection Dr. J. H. Richardson offers 
a very pertinent note of warning : “ Composite 
indexes are of value for broad general purpose, 
but it is essential to examine also the individual 
series upon which they are based. Much 
information of real importance may be concealed 
from view in a composite index. In fact, 
equal but opposite tendencies shown by two 
individual series will be cancelled when they 
are combined. Thus, a composite index may 
show the general state of trade to be the same 
now as a year ago. But this may be the result 
of a considerable increase of activity m some 
industries and of decline in others. Such 
a change may foreshadow the development 
of serious disequilibrium which will precipitate 
a crisis. This danger can be revealed only 
by a study of the individual series. The need 
for this study is generally recognized by lore- 
casting services which give in full detail the 
individual series from which their composites 
are constructed. They are fully aware that 
a satisfactory review of the general state ol 
trade can be made only by an examination 
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both of a composite index and of separate 
data indicating the situation of each of the 
important branches of business activity. 

Here we must say something about the 
statistical methods to be applied in dealing 
with those statistical series. In the first place, 
for the convenience of comparison, index 
numbers are constructed of the relevant series. 
As trade cycle fluctuations are complicated 
by secular trends, these should be determined 
and eliminated. As we have shown below, 
the line of secular trend is calculated with 
the aid of the method of least squares , and 
with a chart drawn this can be easily eliminated. 
It can also be eliminated with the moving 
averages. 

For the elimination of seasonal fluctuations 
the London and Cambridge Economic Service 
uses the following method. Suppose the 
statistical series under consideration shows a 
figure for each month for twenty years. Then 
the average annual figure for the whole period 
is calculated and also the average of the twenty 
January figures, the average of the February 
figures, and so on. The average figure for 
each month is then expressed as a ratio of the 
annual average and these ratios are then 
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used for the elimination of seasonal variation 

from the actual figures.The same 

process would be applied for this purpose 
to the figures for each month throughout 
the entire period covered by the statistics.’ 

Other techniques generally adopted in con¬ 
nection with forecasting are the methods ot 
standard deviation for the measurement of 
variability and that of correlation for the 
establishment of causal relationships between 
associated groups. 


Determination of Secular Trend 

Of vital interest undoubtedly to the business¬ 
man is the technique employed for the 
determination of the general inclination of 
a time series over a long period of time. This 
general inclination or tendency of a time 
series is known as the Long-term or Secular 
Trend. Various methods are employed for 
the fitting of a line or curve to this type of data. 

When only an approximation of the general 
tendency of the series is desired, the best 
method appears to be the fitting ot a trend 
by the free-hand method. This consists of 
drawing a line through a graph in such a way 
as to represent the general course of the series. 
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The fitting of the free-hand trend is some¬ 
times made easier by the plot ting of a three- 
year or five-year moving average as a stepping 
stone to the process. “ The essential features 
to remember is that the first ordinate of the 
moving trend is always plotted for the mid-year 
of the series for which the average is calculated, 
or for any other mid-point that may be repre¬ 
sented on the xmxis.” It should be remarked 
that although the moving average has signi¬ 
ficance enough “ in making approximations of 
the general movements in a series, and particu¬ 
larly in eliminating a large part of the cycle 
that is rather regular,” yet it is ‘ not satisfactory 
when there is no pronounced cycle and when 
the curve representing the actual values in 
the series shows sudden or marked changes 
in any direction that are not a part of the general 
cyclical movement or tendency. ’ 

There is also another method for determina¬ 
tion of the secular trend known as the Straight 
Line of Least Squares. It is particularly useful 
in connection with a series that shows a general 
movement in one direction over a long period 
of time. This technique cannot, however, be 
applied to a series which having first shown an 
upward movement, later on moves downward 



BUSINESS FORECASTING 


101 


in a decided slope. The application of this 
technique is illustrated in the sub joined table : 

Straight Line Trend of Yield of Jute per 
Acre in India, 1921-1939 


(Line of Least Squares) 





Product 
of yield 


Ordinato 


Yield per 

Deviation 

per aero <fc 

Sq. of 

of the 

Year 

acre in lb. 

from the 

deviation 

Deviation 

straight 

(Mean — 
1413) 

midpoint 

from the 
midpoint 
(Co. 2 >: 
Col. 3) 

from tho 
midpoint 
(Sq. of 
Col. 3) 

lino trend 
of least 
squares 


y 

x 

XV 

x2 

Y 

192! 

1241 

—3 

—11,169 

81 

1557 

1922 

2080 

—8 

—16,010 

04 

1541 

1923 

1720 

— i 

—12,040 

49 

1515 

1924 

1001 

—(i 

—9,000 

30 

1509 

1925 

1320 

—.i 

—6,600 

25 

1493 

1929 

1400 

—4 

—3,000 

10 

1177 

1927 

1300 


—4,080 

9 

1401 

1928 

1321 

_ •> 

-2,042 

4 

1445 

1929 

1320 

— 1 

1.320 

1 

1429 

1030 

1281 

n 

0 

0 

1413 

1931 

1040 

1 

1,040 

1 

1397 

1932 

1400 

•> 

a* 

2,800 

4 

1381 

1933 

1041 

3 

4,923 

9 

1305 

1934 

1300 

4 

3,440 

10 

1349 

193.7 

1480 

•1 

7,400 

25 

1333 

1930 

1420 

0 

8,520 

30 

1317 

1937 

1420 

7 

9.940 

49 

1301 

1938 

1241 

8 

2,928 

04 

1285 

1939 

1201 

3 

10,809 

Xl 

1279 

M — - ■ — • • 

Totul 

• • • 

0 

—8,897 

570 

— 
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The main point to be remembered in this 
connection is that the ordinates of the straight 
line trend of least squares are determined 
with the help of the formula : 

Y=a-f bx 

Where a=the arithmetic mean of the series ; 
b=the slope of the line of least squares; x=the 
deviation from the point of origin; Y=the 
ordinate. 


The slope of the line of least squares is calcu¬ 
lated with the formula : 



Where b=the slope of the line ot least 
squares; Sxy=summation of the product of 
each item in the series multiplied by the corres¬ 
ponding deviation from the midpoint; Sx"= 
summation of the squares of the deviations 
from the midpoint. 


In the above example b= 



Y—a-f bx = 1413+ ( — 16 x deviation from 
the point of origin). 



CHAPTER X 


STATISTICAL REASONING 

The interpretation of statistics or the draw¬ 
ing of inferences from numerical facts is a 
© 

job for the expert. Statistical methods are 
indeed most dangerous devices in the hands 
of an inexpert. The necessity arises, there¬ 
fore, for statistics being handled only by 
experts. 

Blunders in statistical reasoning may be 
due to various causes. But the most common 
sources of error are the use of inaccurate or 
incomplete data, dishonest or misleading 
methods of presentation, false generalizations 
and faulty use of statistical methods. We 
have already seen that patient collection ol 
facts or data constitutes the first most important 
step in all statistical investigations. For, no 
statistical results can be arrived at that are 
not already implicit in the data, and the accu¬ 
racy of the former depends on that ol the 
latter. In this connection we have noticed 
that one of the common statistical methods 
for the collection of data is Sampling. It 
is from the characteristics of the sample that 
we infer the characteristics of the population 
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or the bulk from which the sample is taken. 
A population consists of individuals , and al¬ 
though the individuals in a population vary, 
nevertheless they merge in the population 
and their individuality is lost. The formulas 
and laws describing the behaviour of popula¬ 
tions as opposed to individuals are known 
as Statistical Laws. One of the fundamental 
laws warranting the reliability of sample data 
is the Law of Statistical Regularity based 
upon the mathematical Theory of Probability. 
But there may be errors of sampling, and k the 
error in a sample result depends on the size 
of the sample, on the nature of the bulk being 
sampled (particularly on the variation within 
it) and the way in which the sample is taken.’ 
Broadly speaking, other things being equal, 
such sampling errors are proportional to the 
amount of variation in the population. The 
biggest error, therefore, decreases as the size 
of the sample is increased. Errors of a sample 
may be either a Standard Error or an Error 
of Bias. Mathematically speaking, the standard 
error is inversely proportional to the square 
root of the number in the sample. On the 
other hand, error of bias is displayed in repre¬ 
sentative sampling of the type used in Gallup 
polls of public opinion, and it is necessary 
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to use very elaborate sampling methods to 
avoid errors of this kind. As an example 
of such “ biassed ” use of statistics, we may 
cite here the results of Gallup Polls on American 
Presidential election of 1944, held on August o, 
that is to say, thirteen weeks before the actual 
elections. Fortune survey showed 52-5 per cent 
score in favour of Roosevelt, 43*9 per cent in 
favour of Dewey and 3 6 per cent Don’t knows. 
This was found to conform more approximately 
to the actual results of the election held in 
next November. Simultaneously, Pollster 
George Gallup also had a score sheet, which 
revealed 51*3 per cent votes in favour oi Dewey 
and 48-7 per cent votes in favour ol Roosevelt 
—which was evidently based on a biassed 
sample and, therefore, yielded a wrong result. 


Use of irrelevant or incomplete data and the 
drawing of false generalizations therefrom is a 
common trick of propagandists and commercial 
advertisers. Some interesting examples o 
insidious presentation of statistical data to 
slip across dubious arguments to the public 
are cited by Mr. L. H. C. Tippet in his 
recent work on Statistics published by 
the Oxford University Press. The following 
advertisement appeared in 1931 : “ U is men 
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of exceptional experience who are buying X. 

cars today. 87 per cent of X.cars today 

are bought by men who have owned six other 
makes of cars before.” Mr. Tippet comments : 
“ I suppose it is unlikely that as many as 
87 per cent of all makes of cars are bought 
by such veterans as those mentioned in the 

advertisement, and the purchasers of X.cars 

are probably exceptional, but they may be 
exceptional in their fickleness—and do the 

makers of the X.cars wish us to believe 

that they do not get many repeat orders ? 
These are possible interpretation of the data.” 

Another possible explanation is that X.cars 

were not very low-priced, and statistically 
speaking salaries increase with age. Another 
example is extracted from a report on an 
enquiry instituted some years before the last 
war as to the effects of the use of oatmeal 
among children and in pub'ic institutions. 
The Report states : “In Manchester 2,333 
children in all were questioned. In one school 
of 200, 84 per cent were regular users, and 
the teacher stated, judging from regularity 
of attendance, that those getting oatmeal were 
the most satisfactory. In a girls’ school of 
182 pupils, 33 had porridge, and the head¬ 
mistress reported : the majority of oat users 
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are strong children, well-nourished, and class 
work -mod. The non-users are not strong 
and more liable to take colds and infectious 
diseases, and class work only moderately good 
On this Mr. Tippet comments : ' Now the 
essential information in the above extract 
is the better health of children who use oatmeal, 
but this is given only in vague qualitative 
terms. No statistician would rely on such 
general impressions as are quoted. What were 
their sickness records ? Moreover, the poorness 
of the data is covered over, doubtless 
unintentionally, by some very exact but irre¬ 
levant figures. It does not matter two hoots 
how many children were questioned, or how 
many took porridge. Without these figures, 
t he data would be seen to be what they are 
weak. Most people take milk with porridge, 
which might be extra to milk taken otherwise 
and that might be the cause of the improved 
health. All we know, if we know anything 
from the data, is that oatmeal plus milk plus 
the condiments are good for health as compared 
with the food that is eaten as an alternative. 

Something may now be said about the ^ 
fallacies of statistical reasoning due to the 
limitations of statistical methods. For instance, 
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although there is no single statistical quantity 
more valuable than the average, yet statis¬ 
ticians are at great pains to stress the inadequacy 
of this constant. Professor Bowley for instance 
observes : “ Of itself an arithmetical average 
is more likely to conceal than to disclose 
important facts; it is of the nature of an 
abbreviation, and is often an excuse for lazi¬ 
ness.” In fact, the average fails to measure 
the important facts that arise from variation. 
As ‘ the strength of a chain is the strength 
of its weakest link, not that of the average 
link,’ so ‘ when data are in the form of a time 
series and averages are taken over a long period 
of time, they are apt to conceal important 
changes in the trend.’ Mr. Tippet remarks : 
“ A development of great importance in applied 
statistics has taken place during the past 
decade or so, and the results form a recognition 
of the fact that variation is a composite 
quantity, resulting from the combined effects of 
a multitude of factors.” This is particularly 
evidenced in wrong interpretation of Index 
Numbers, in the making of which averaging 
plays an important part. For instance, from 
a rise in price index it may be argued that 
there is inflation in the country. But the 
value ot such an argument is dubious unless 
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it is recognised that multitudes of causes 
have their effects on the index numbers and 
all the various factors concerned arc taken 
into account. The index number here merely 
reveals a change in relationship and does 
not prove a case. It has indeed been very 
rightly observed that the use of statistical 
data to prove a case, in the sense of demon¬ 
strating it, is unscientific; but their use to 
prove a case in the old-fashioned sense of 
testing it is scientific and profitable. A statis¬ 
tical inquiry should be approached with a 
mind that is open but not empty. ’ 

Many faulty arguments also arise from the 
wrong interpretation of the Coefficient ol Corre¬ 
lation. As an instance ol such nonsense 
correlations, we may cite the fact " that the 
proportion of marriages solemnized in the 
Church of England and the death rate for 
the country have for many years been decreas¬ 
ing—there is a correlation between the two. 

So much indeed for the “ don’ts ” that 
the statistician should beware of in handling 
statistical data. Let us now consider the “ do s 
that should be observed when drawing in¬ 
ferences from statistical data. The statistical 
method, it should always be borne in mind, 
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is not different from the general scientific 
method. As a matter of fact, it is part of the 
same method, and is based on the same funda¬ 
mental ideas and processes. In other words, 
statistical reasoning is not different from any 
other kind of reasoning. And as it is in any 
other field of knowledge, the use ol good 
working hypotheses is the most essential aspect 
of statistical reasoning. “ The ability to for¬ 
mulate fruitful hypotheses and design 
experiments to test them is the quality of a 
first-rate scientist. In addition to this personal 
quality, habits of thought and even prejudice 
have their influence on the kinds of hypothesis 
that will be entertained. For this reason 
impartiality is essential; and an investigator 
is most likely to be impartial if he is disinterested 
in the issue of the inquiry. The investigator 
should not be narrow minded, and should 
be prepared to consider any reasonable alter¬ 
native to the main hypotheses he favours, 
but he cannot afford to waste his time on 
unreasonable ones.” 

As a matter of fact, a statistician should 
not be dogmatic about his conclusions. He 
should indeed have a ‘ critical apparatus suffi¬ 
ciently well-developed and discriminative to 
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prevent an undue proportion of false conclu¬ 
sions being reached as a result of statistical 
inquiries.’ He should take an impartial view 
of things and should not suffer himself to be 
stimulated by any emotional appeal. For 
instance in connection with the enquiry relat¬ 
ing to the benefit of oatmeal at the Manchestei 
schools cited by Tippet, any of the following 
hypotheses may be adopted : the benefit may 
be derived from (a) oatmeal alone, (b) milk 
alone, and (c) neither oatmeal nor milk. Then 
the value of these different hypotheses may be 
tested by measuring separately the health 
of children who took (a) oatmeal and milk, 
(6) oatmeal without milk, (c) milk alone, 
(d) oatmeal alone, and (c) neither milk nor 
oatmeal. 

When in testing hypotheses it is found that 
the data are capable of satisfying several 
reasonable hypotheses, the need arises then 
of fresh data being collected before any discri¬ 
mination can be made between them. In 
this connection it should, however, be noted 
that “ the favourite hypothesis With which 
the statistician usually first examines data 
is that the observed variations and effects arc 
due to random errors or to chance rather than 
to the operation of newly discovered causes. 
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Applied Statistics 

Applied Statistics deal with the practical 
application of statistical rules and formulas 
to concrete subject matters like prices, pro¬ 
duction, wages, trade, etc. In recent times 
much applied statistical work has been done 
in the business and economic fields. In U. S. A., 
for instance, statistics play a large role in the 
development of general policy, management, 
production, forecasting of business and trade, 
separation of cyclical, seasonal, and random 
movements of business trend, estimating of 
the elasticity of demand, ascertaining of con¬ 
sumer markets and public opinion, formulation 
of sales and advertising policies, and last but 
not the least in the development of life 
insurance. In connection with such business 
and economic investigations statistical data 
are used either with a view to formulating 
new theories, or testing of existing theories, 
or providing a measure of quantities that 
emerge from economic analysis. Needless to 
observe, there is need on the part of such 
investigators not only to have expert knowledge 
of statistical methods, but of the technical 
aspects of the problems under inquiry. 



CHAPTER XI 


CALCULATION BY LOGARITHMS 

Students working in a statistical laboratory 
have the advantage of many a mechanical 
device for their calculations. For instance, 
he has the Calculating Machines, the Slide 
Rules, the Card-sorting Machines, the Correla¬ 
tion Calculators, and so forth. 

But as these devices are not readily acces¬ 
sible to the ordinary student he must, therefore, 
alternatively have some simpler equipments 
like Tables of Logarithms, Square Roots, Cube 
Roots, Reciprocals, etc. There are various 
editions of these in the market, and for ordinary 
practical purposes the statistician is advised 
to have a copy of Four Figure Mathematical 
Table* by Frank Castle (Macmillan) obtainable 
at the nominal price ot a tew annas from any 
book-seller. For more advanced purposes, they 
are advised to have Chambers' Mathematical 
Tables, Mathematical Tables No. I (published 
by Statistical Laboratory, Calcutta), Tables 
for Statisticians and Biometricians by Karl 
Pearson, Barlows Tables of Squares ,’ Cubes, 
Square Roots, Cube Roots and Reciprocals of 
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all Integral Numbers up to 10,000 (E. & F. N. 
Spon). 

In practical work calculation by logarithms 
saves much time. A Logarithmic Table or the 
Log Table as it is commonly called, is a 
collection of auxiliary numbers so devised that 

(i) multiplication of common numbers 

can be performed by the addition 
of their logs. 

(ii) division by their subtraction. 

(Hi) involution or raising of powers by 
their multiplication. 

(iv) evolution or extraction of roots by 
their division. 

Thus if x and y be two numbers, the log 
method can be symbolically expressed as 

follows : 

(i) log (xxy) = log x + log y. 

(ii) log xry = log x - log y. 

(fit) log x n = n. log x. 

(ir) log ~ log x. 

This is based upon the simple algebrical 
Law of Indices by which we know that 

X 2 X x s = x-4- 2 = X 4 , or x 4 ^ x 2 =x 4 ’ 3 =x*, 
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and so forth. The integral part of a logarithm 
is called its characteristic and the decimal 
part is called the mantissa. Only the mantissa 
is found in the Log Table, and the characteristic 
is obtained by inspection in accordance with 
the following rule : The characteristic of a 
number greater than unity is positive and is one 
less than the number of digits to the left of the 
decimal point ; the characteristic of a number 
less than unity is negative and is greater by one 
them the number of zeros which follow the decimal 

point. 

A bar is usually put over a negative 
characteristic. Thus, the characteristic of 

2134 0 is 3 2134 is 1- 

213*4 is 2 02134 is 2— 

21-34 is l 002134 is 3— 

2-134 is 0 0002134 is 4— 

and so forth. 

Thus the characteristic ot the logarithms 
of numbers from 1 to 0 inclusive is 0. In 
like manner the characteristic of logarithms 
of numbers from 10 to 09 is 1, whereas the 
characteristic of logarithms ol numbers from 
100 to 999 is 2. The characteristic of logarithms 
of numbers from 1000 to 9999 is 3, and so ou 
for all other numbers. 
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The mantissa may be expressed with a large 
number of digits, but for all practical purposes 
in statistical calculations a four-figure fraction 
will suffice, the last digits being rounded oft 
in the same way that 21*337 per cent is expressed 

as 21*34 per cent. 

Logarithms of numbers are found by first 
determining the characteristic and then looking 
up the mantissa. Let us, for instance, find 
the log 21*34. According to the above rule the 
characteristic will be 1, and to find the mantissa 
we refer to the Log Tabic. There to obtain the 
mantissa corresponding to 21*34, we look up the 
index opposite to 21 in the column headed 3. 
There we get *3284. Then for the fourth signifi¬ 
cant figure we refer to the difference column 
headed 4 at the right hand side of the Table 
and get 8 which we add to *3284. The mantissa 
is thus *3292 and the required logarithm is there¬ 
fore 1*3292. Where the difference column is 
not furnished, the fourth significant .figure 
is determined by taking the difference between 
the mantissa of log 21*3 and log 21*4 multi¬ 
plying the difference by 0*5 and then adding 
the product to the mantissa of log 21*3. 

It should be remembered that the answer 
to any problem calculated with the aid of 
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logarithms, and to find the number (known 
as antilogarithm) corresponding to a given 
logarithm, it is but necessary to locate the 
mantissa of the logarithm in the table and 
then read off the number corresponding thereto. 
Suppose the logarithm is 1*00. Reading down 
in the table of logarithms we find the fraction. 
*0000 is located opposite 10 in the column 
headed 0. The characteristic 1 tells us that 
there must be two digits in that part of the 
number preceding the decimal. Accordingly 
we record 10 as the whole number and ’00 
as its fraction. The number corresponding 
to the logarithm is therefore 10*00. 

As we have seen, to perform multiplication 
by logarithms, we add the logarithms of the 
multiplier and the multiplicand, and their 
sum is the logarithm of the product. Thus, 

21*34 x 213*4 = log 21*34-flog 213*4 

= 1*3292 -f 2*3292=3* 6584 

= 3*6584. Its antilog 

= 4554 the required pro¬ 
duct. 

1 o perform division by logarithms we subtract 
tho logarithm of the diviser from the logarithm 
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of the dividend, and the remainder is the 
logarithm of the quotient. Thus 

213*4 -i- 21*34 = log 213*4— log 21*34 

= 2*3292 - 1*3292 = 1*00 
Its antilog = 10*00 the 
required quotient. 

By application of the rules for multiplica¬ 
tion, we can also calculate Proportions by 
the log method. Here we add together the 
logarithms of the second and the third terms 
and from their sum subtract the logarithm 
of the first, and the remainder will be the 
logarithms of the fourth term. 

To perform Involution by logarithms we 
multiply the logarithms of the given number 
by the exponent of the power to which it is 
to be raised and the product will be the 
logarithms of the required power. Thus the 
square of 21*34 will be calculated as follows : 

(21*34)“ - log (21*34)“ = 2x1*3292 


Its antilog = 455*4 the 
required square. 
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To perform Evolution by logarithms we 
divide the logarithms of the given number 
by the exponent of the root which is to be 
extracted, and the quotient will be the 
logarithms of the required root. Thus 

\ 2\ r M = J log 2134 = i x 13292 
= 0-6646. 

Its antilog = 4-619 the required 
square root. 

Books invaluable for all statistical workers are : 

1. Mills—Statistical Methods. 

2 Yule—Introduction to the Theory of 

Statistics. 

3 Castle—Four Figure Mathematical Tables. 

4 . Pearson — Tables for Statisticians and 

Biometricians. 

r,. Barlow’s Tables of Squares, Cubes, Square 

Roots, Cube Roots and Reci¬ 
procals of all Integral Numbers 
up to 10 , 000 . 
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0-0 IK 

0-382-0-389 

0018 

0-372-0-377 

0019 

0-390-0-396 

0019 

0-378-0-383 

0-02M 

0-397-0-403 

0020 

0-384-0-388 

0021 

0-404-0-409 

0021 

0-389-0-393 

0022 

0-410-0-416 

0022 

0-394-0-399 

0023 

0-417-0-422 

0023 

O-400-0-404 

0024 

0-423-0-428 

0024 

0-405-0-409 

0025 

0-429-0-434 

0-025 

0-4 10-0-414 

0026 

0-435-0-440 

0026 

0-415-0-419 

0027 

0-441-0-446 

0027 

0-420-0-423 

0028 

0-447-0-452 

0028 

0-424-0-428 

0-029 

0-453-0-457 

0029 

0-429-0-432 

0030 

0-458 0-403 

0030 

0-433-0-436 

0031 

0-464-0-468 

0031 

O-437-0-441 

0032 

0-469-0-473 

0032 

0-442-0-445 

0033 

0-474-0-478 

0033 

0-4 46-0-449 

0 034 

0-479-0-483 

0034 

0-450-0-453 

0035 

0-484-0-488 

0035 

0-454-0-456 

0-036 

0-489-0-493 

0036 

0-4 57-0-400 

0037 

0-494-0-498 

0037 

0-461-0-404 

003K 

0-499-0-502 

003K 

O-465-0-467 

0039 

0-503-0-507 

0-039 

0-468-0-471 

0-040 

0-508-0-512 

0040 

0-472-0-474 

0041 

0-613-0-516 

0041 

0-475-0-478 

0-042 

0-617-0-520 

0042 

0-479-0-481 

0043 

0-621-0-625 

0043 

0-482-0-484 

0 044 

0-526-0-529 

0044 

»»*4S:>-0*48S 0*045 

0-630 0533 

0046 

0-480-0-491 

0*040 

0-534 0-537 

0046 

0-402-0-404 0-047 

0-638-0-542 

0047 

0*405-0*407 0-04K 

0-543 0-546 

1 0048 

0-408-0-500 0-040 

0*647.0*r>ri0 0-0411 


I o uho this table, look up the Valuo ol' r in the loft-hund column 
>u»d add to it the corresponding vuluo in the second column, a* i it 
*lways bigger Hum r. To turn z into r, look up z in the third 
column and subtract the corresponding entry in the last column. 



